Science.gov

Sample records for acid sequence coded

  1. Phenolic acid esterases, coding sequences and methods

    DOEpatents

    Blum, David L.; Kataeva, Irina; Li, Xin-Liang; Ljungdahl, Lars G.

    2002-01-01

    Described herein are four phenolic acid esterases, three of which correspond to domains of previously unknown function within bacterial xylanases, from XynY and XynZ of Clostridium thermocellum and from a xylanase of Ruminococcus. The fourth specifically exemplified xylanase is a protein encoded within the genome of Orpinomyces PC-2. The amino acids of these polypeptides and nucleotide sequences encoding them are provided. Recombinant host cells, expression vectors and methods for the recombinant production of phenolic acid esterases are also provided.

  2. Amino acid repeats cause extraordinary coding sequence variation in the social amoeba Dictyostelium discoideum.

    PubMed

    Scala, Clea; Tian, Xiangjun; Mehdiabadi, Natasha J; Smith, Margaret H; Saxer, Gerda; Stephens, Katie; Buzombo, Prince; Strassmann, Joan E; Queller, David C

    2012-01-01

    Protein sequences are normally the most conserved elements of genomes owing to purifying selection to maintain their functions. We document an extraordinary amount of within-species protein sequence variation in the model eukaryote Dictyostelium discoideum stemming from triplet DNA repeats coding for long strings of single amino acids. D. discoideum has a very large number of such strings, many of which are polyglutamine repeats, the same sequence that causes various human neurological disorders in humans, like Huntington's disease. We show here that D. discoideum coding repeat loci are highly variable among individuals, making D. discoideum a candidate for the most variable proteome. The coding repeat loci are not significantly less variable than similar non-coding triplet repeats. This pattern is consistent with these amino-acid repeats being largely non-functional sequences evolving primarily by mutation and drift. PMID:23029418

  3. Cellulases and coding sequences

    DOEpatents

    Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong

    2001-02-20

    The present invention provides three fungal cellulases, their coding sequences, recombinant DNA molecules comprising the cellulase coding sequences, recombinant host cells and methods for producing same. The present cellulases are from Orpinomyces PC-2.

  4. Cellulases and coding sequences

    DOEpatents

    Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong

    2001-01-01

    The present invention provides three fungal cellulases, their coding sequences, recombinant DNA molecules comprising the cellulase coding sequences, recombinant host cells and methods for producing same. The present cellulases are from Orpinomyces PC-2.

  5. Lichenase and coding sequences

    DOEpatents

    Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong

    2000-08-15

    The present invention provides a fungal lichenase, i.e., an endo-1,3-1,4-.beta.-D-glucanohydrolase, its coding sequence, recombinant DNA molecules comprising the lichenase coding sequences, recombinant host cells and methods for producing same. The present lichenase is from Orpinomyces PC-2.

  6. Mutation-selection models of coding sequence evolution with site-heterogeneous amino acid fitness profiles.

    PubMed

    Rodrigue, Nicolas; Philippe, Hervé; Lartillot, Nicolas

    2010-03-01

    Modeling the interplay between mutation and selection at the molecular level is key to evolutionary studies. To this end, codon-based evolutionary models have been proposed as pertinent means of studying long-range evolutionary patterns and are widely used. However, these approaches have not yet consolidated results from amino acid level phylogenetic studies showing that selection acting on proteins displays strong site-specific effects, which translate into heterogeneous amino acid propensities across the columns of alignments; related codon-level studies have instead focused on either modeling a single selective context for all codon columns, or a separate selective context for each codon column, with the former strategy deemed too simplistic and the latter deemed overparameterized. Here, we integrate recent developments in nonparametric statistical approaches to propose a probabilistic model that accounts for the heterogeneity of amino acid fitness profiles across the coding positions of a gene. We apply the model to a dozen real protein-coding gene alignments and find it to produce biologically plausible inferences, for instance, as pertaining to site-specific amino acid constraints, as well as distributions of scaled selection coefficients. In their account of mutational features as well as the heterogeneous regimes of selection at the amino acid level, the modeling approaches studied here can form a backdrop for several extensions, accounting for other selective features, for variable population size, or for subtleties of mutational features, all with parameterizations couched within population-genetic theory. PMID:20176949

  7. An Interpretation of the Ancestral Codon from Miller’s Amino Acids and Nucleotide Correlations in Modern Coding Sequences

    PubMed Central

    Carels, Nicolas; de Leon, Miguel Ponce

    2015-01-01

    Purine bias, which is usually referred to as an “ancestral codon”, is known to result in short-range correlations between nucleotides in coding sequences, and it is common in all species. We demonstrate that RWY is a more appropriate pattern than the classical RNY, and purine bias (Rrr) is the product of a network of nucleotide compensations induced by functional constraints on the physicochemical properties of proteins. Through deductions from universal correlation properties, we also demonstrate that amino acids from Miller’s spark discharge experiment are compatible with functional primeval proteins at the dawn of living cell radiation on earth. These amino acids match the hydropathy and secondary structures of modern proteins. PMID:25922573

  8. Nucleotide and derived amino acid sequences of a cDNA coding for pre-uteroglobin from the lung of the hare (Lepus capensis).

    PubMed Central

    López de Haro, M S; Nieto, A

    1986-01-01

    An almost full-length cDNA coding for pre-uteroglobin from hare lung was cloned and sequenced. The derived amino acid sequence indicated that hare pre-uteroglobin contained 91 amino acids, including a signal peptide of 21 residues. Comparison of the nucleotide sequence of hare pre-uteroglobin cDNA with that previously reported for the rabbit gene indicated five silent point substitutions and six others leading to amino acid changes in the coding region. The untranslated regions of both pre-uteroglobin mRNAs were very similar. The amino acid changes observed are discussed in relation to the different progesterone-binding abilities of both homologous proteins. PMID:3019311

  9. Image Sequence Coding by Octrees

    NASA Astrophysics Data System (ADS)

    Leonardi, Riccardo

    1989-11-01

    This work addresses the problem of representing an image sequence as a set of octrees. The purpose is to generate a flexible data structure to model video signals, for applications such as motion estimation, video coding and/or analysis. An image sequence can be represented as a 3-dimensional causal signal, which becomes a 3 dimensional array of data when the signal has been digitized. If it is desirable to track long-term spatio-temporal correlation, a series of octree structures may be embedded on this 3D array. Each octree looks at a subset of data in the spatio-temporal space. At the lowest level (leaves of the octree), adjacent pixels of neighboring frames are captured. A combination of these is represented at the parent level of each group of 8 children. This combination may result in a more compact representation of the information of these pixels (coding application) or in a local estimate of some feature of interest (e.g., velocity, classification, object boundary). This combination can be iterated bottom-up to get a hierarchical description of the image sequence characteristics. A coding strategy using such data structure involves the description of the octree shape using one bit per node except for leaves of the tree located at the lowest level, and the value (or parametric model) assigned to each one of these leaves. Experiments have been performed to represent Common Image Format (CIF) sequences.

  10. Molecular cloning, coding nucleotides and the deduced amino acid sequence of P-450BM-1 from Bacillus megaterium.

    PubMed

    He, J S; Ruettinger, R T; Liu, H M; Fulco, A J

    1989-12-22

    The gene encoding barbiturate-inducible cytochrome P-450BM-1 from Bacillus megaterium ATCC 14581 has been cloned and sequenced. An open reading frame in the 1.9 kb of cloned DNA correctly predicted the NH2-terminal sequence of P-450BM-1 previously determined by protein sequencing, and, in toto, predicted a polypeptide of 410 amino acid residues with an Mr of 47,439. The sequence is most, but less than 27%, similar to that of P-450CAM from Pseudomonas putida, so that P-450BM-1 clearly belongs to a new P-450-gene family, distinct especially from that of the P-450 domain of P-450BM-3, a barbiturate-inducible single polypeptide cytochrome P-450:NADPH-P-450 reductase from the same strain of B. megaterium (Ruettinger, R.T., Wen, L.-P. and Fulco, A.J. (1989) J. Biol. Chem. 264, 10987-10995). PMID:2597681

  11. Variation in seed fatty acid composition and sequence divergence in the FAD2 gene coding region between wild and cultivated sesame.

    PubMed

    Chen, Zhenbang; Tonnis, Brandon; Morris, Brad; Wang, Richard B; Zhang, Amy L; Pinnow, David; Wang, Ming Li

    2014-12-01

    Sesame germplasm harbors genetic diversity which can be useful for sesame improvement in breeding programs. Seven accessions with different levels of oleic acid were selected from the entire USDA sesame germplasm collection (1232 accessions) and planted for morphological observation and re-examination of fatty acid composition. The coding region of the FAD2 gene for fatty acid desaturase (FAD) in these accessions was also sequenced. Cultivated sesame accessions flowered and matured earlier than the wild species. The cultivated sesame seeds contained a significantly higher percentage of oleic acid (40.4%) than the seeds of the wild species (26.1%). Nucleotide polymorphisms were identified in the FAD2 gene coding region between wild and cultivated species. Some nucleotide polymorphisms led to amino acid changes, one of which was located in the enzyme active site and may contribute to the altered fatty acid composition. Based on the morphology observation, chemical analysis, and sequence analysis, it was determined that two accessions were misnamed and need to be reclassified. The results obtained from this study are useful for sesame improvement in molecular breeding programs.

  12. Numerical classification of coding sequences

    NASA Technical Reports Server (NTRS)

    Collins, D. W.; Liu, C. C.; Jukes, T. H.

    1992-01-01

    DNA sequences coding for protein may be represented by counts of nucleotides or codons. A complete reading frame may be abbreviated by its base count, e.g. A76C158G121T74, or with the corresponding codon table, e.g. (AAA)0(AAC)1(AAG)9 ... (TTT)0. We propose that these numerical designations be used to augment current methods of sequence annotation. Because base counts and codon tables do not require revision as knowledge of function evolves, they are well-suited to act as cross-references, for example to identify redundant GenBank entries. These descriptors may be compared, in place of DNA sequences, to extract homologous genes from large databases. This approach permits rapid searching with good selectivity.

  13. Orpinomyces cellulase celf protein and coding sequences

    SciTech Connect

    Li, Xin-Liang; Chen, Huizhong; Ljungdahl, Lars G.

    2000-09-05

    A cDNA (1,520 bp), designated celF, consisting of an open reading frame (ORF) encoding a polypeptide (CelF) of 432 amino acids was isolated from a cDNA library of the anaerobic rumen fungus Orpinomyces PC-2 constructed in Escherichia coli. Analysis of the deduced amino acid sequence showed that starting from the N-terminus, CelF consists of a signal peptide, a cellulose binding domain (CBD) followed by an extremely Asn-rich linker region which separate the CBD and the catalytic domains. The latter is located at the C-terminus. The catalytic domain of CelF is highly homologous to CelA and CelC of Orpinomyces PC-2, to CelA of Neocallimastix patriciarum and also to cellobiohydrolase IIs (CBHIIs) from aerobic fungi. However, Like CelA of Neocallimastix patriciarum, CelF does not have the noncatalytic repeated peptide domain (NCRPD) found in CelA and CelC from the same organism. The recombinant protein CelF hydrolyzes cellooligosaccharides in the pattern of CBHII, yielding only cellobiose as product with cellotetraose as the substrate. The genomic celF is interrupted by a 111 bp intron, located within the region coding for the CBD. The intron of the celF has features in common with genes from aerobic filamentous fungi.

  14. Cloning, nucleotide sequence, and regulation of the Bacillus subtilis gpr gene, which codes for the protease that initiates degradation of small, acid-soluble proteins during spore germination.

    PubMed Central

    Sussman, M D; Setlow, P

    1991-01-01

    The gpr gene, which codes for the protease that initiates degradation of small, acid-soluble proteins during spore germination, has been cloned from Bacillus megaterium and Bacillus subtilis, and its nucleotide sequence has been determined. Use of a translational gpr-lacZ fusion showed that the B. subtilis gpr gene was expressed primarily, if not exclusively, in the forespore compartment of the sporulating cell, with expression taking place approximately 1 h before expression of glucose dehydrogenase and ssp genes. gpr-lacZ expression was abolished in spoIIAC (sigF) and spoIIIE mutants but was reduced only approximately 50% in a spoIIIG (sigG) mutant. However, the kinetics of the initial approximately 50% of gpr-lacZ expression were unaltered in a spoIIIG mutant. The in vivo transcription start site of gpr has been identified and found to be identical to the in vitro start site on this gene with either E sigma F or E sigma G. Induction of sigma G synthesis in vivo turned on gpr-lacZ expression in parallel with synthesis of glucose dehydrogenase. These data are consistent with gpr transcription during sporulation first by E sigma F and then by E sigma G. Images PMID:1840582

  15. Nonspatial Sequence Coding in CA1 Neurons

    PubMed Central

    Allen, Timothy A.; Salz, Daniel M.; McKenzie, Sam

    2016-01-01

    The hippocampus is critical to the memory for sequences of events, a defining feature of episodic memory. However, the fundamental neuronal mechanisms underlying this capacity remain elusive. While considerable research indicates hippocampal neurons can represent sequences of locations, direct evidence of coding for the memory of sequential relationships among nonspatial events remains lacking. To address this important issue, we recorded neural activity in CA1 as rats performed a hippocampus-dependent sequence-memory task. Briefly, the task involves the presentation of repeated sequences of odors at a single port and requires rats to identify each item as “in sequence” or “out of sequence”. We report that, while the animals' location and behavior remained constant, hippocampal activity differed depending on the temporal context of items—in this case, whether they were presented in or out of sequence. Some neurons showed this effect across items or sequence positions (general sequence cells), while others exhibited selectivity for specific conjunctions of item and sequence position information (conjunctive sequence cells) or for specific probe types (probe-specific sequence cells). We also found that the temporal context of individual trials could be accurately decoded from the activity of neuronal ensembles, that sequence coding at the single-cell and ensemble level was linked to sequence memory performance, and that slow-gamma oscillations (20–40 Hz) were more strongly modulated by temporal context and performance than theta oscillations (4–12 Hz). These findings provide compelling evidence that sequence coding extends beyond the domain of spatial trajectories and is thus a fundamental function of the hippocampus. SIGNIFICANCE STATEMENT The ability to remember the order of life events depends on the hippocampus, but the underlying neural mechanisms remain poorly understood. Here we addressed this issue by recording neural activity in hippocampal

  16. CROSS-DISCIPLINARY PHYSICS AND RELATED AREAS OF SCIENCE AND TECHNOLOGY: The structural analysis of protein sequences based on the quasi-amino acids code

    NASA Astrophysics Data System (ADS)

    Zhu, Ping; Tang, Xu-Qing; Xu, Zhen-Yuan

    2009-01-01

    Proteomics is the study of proteins and their interactions in a cell. With the successful completion of the Human Genome Project, it comes the postgenome era when the proteomics technology is emerging. This paper studies protein molecule from the algebraic point of view. The algebraic system (Σ, +, *) is introduced, where Σ is the set of 64 codons. According to the characteristics of (Σ, +, *), a novel quasi-amino acids code classification method is introduced and the corresponding algebraic operation table over the set ZU of the 16 kinds of quasi-amino acids is established. The internal relation is revealed about quasi-amino acids. The results show that there exist some very close correlations between the properties of the quasi-amino acids and the codon. All these correlation relationships may play an important part in establishing the logic relationship between codons and the quasi-amino acids during the course of life origination. According to Ma F et al (2003 J. Anhui Agricultural University 30 439), the corresponding relation and the excellent properties about amino acids code are very difficult to observe. The present paper shows that (ZU, ⊕, otimes) is a field. Furthermore, the operational results display that the codon tga has different property from other stop codons. In fact, in the mitochondrion from human and ox genomic codon, tga is just tryptophane, is not the stop codon like in other genetic code, it is the case of the Chen W C et al (2002 Acta Biophysica Sinica 18(1) 87). The present theory avoids some inexplicable events of the 20 kinds of amino acids code, in other words it solves the problem of 'the 64 codon assignments of mRNA to amino acids is probably completely wrong' proposed by Yang (2006 Progress in Modern Biomedicine 6 3).

  17. Short sequence motifs, overrepresented in mammalian conservednon-coding sequences

    SciTech Connect

    Minovitsky, Simon; Stegmaier, Philip; Kel, Alexander; Kondrashov,Alexey S.; Dubchak, Inna

    2007-02-21

    Background: A substantial fraction of non-coding DNAsequences of multicellular eukaryotes is under selective constraint. Inparticular, ~;5 percent of the human genome consists of conservednon-coding sequences (CNSs). CNSs differ from other genomic sequences intheir nucleotide composition and must play important functional roles,which mostly remain obscure.Results: We investigated relative abundancesof short sequence motifs in all human CNSs present in the human/mousewhole-genome alignments vs. three background sets of sequences: (i)weakly conserved or unconserved non-coding sequences (non-CNSs); (ii)near-promoter sequences (located between nucleotides -500 and -1500,relative to a start of transcription); and (iii) random sequences withthe same nucleotide composition as that of CNSs. When compared tonon-CNSs and near-promoter sequences, CNSs possess an excess of AT-richmotifs, often containing runs of identical nucleotides. In contrast, whencompared to random sequences, CNSs contain an excess of GC-rich motifswhich, however, lack CpG dinucleotides. Thus, abundance of short sequencemotifs in human CNSs, taken as a whole, is mostly determined by theiroverall compositional properties and not by overrepresentation of anyspecific short motifs. These properties are: (i) high AT-content of CNSs,(ii) a tendency, probably due to context-dependent mutation, of A's andT's to clump, (iii) presence of short GC-rich regions, and (iv) avoidanceof CpG contexts, due to their hypermutability. Only a small number ofshort motifs, overrepresented in all human CNSs are similar to bindingsites of transcription factors from the FOX family.Conclusion: Human CNSsas a whole appear to be too broad a class of sequences to possess strongfootprints of any short sequence-specific functions. Such footprintsshould be studied at the level of functional subclasses of CNSs, such asthose which flank genes with a particular pattern of expression. Overallproperties of CNSs are affected by patterns in

  18. High compression image and image sequence coding

    NASA Technical Reports Server (NTRS)

    Kunt, Murat

    1989-01-01

    The digital representation of an image requires a very large number of bits. This number is even larger for an image sequence. The goal of image coding is to reduce this number, as much as possible, and reconstruct a faithful duplicate of the original picture or image sequence. Early efforts in image coding, solely guided by information theory, led to a plethora of methods. The compression ratio reached a plateau around 10:1 a couple of years ago. Recent progress in the study of the brain mechanism of vision and scene analysis has opened new vistas in picture coding. Directional sensitivity of the neurones in the visual pathway combined with the separate processing of contours and textures has led to a new class of coding methods capable of achieving compression ratios as high as 100:1 for images and around 300:1 for image sequences. Recent progress on some of the main avenues of object-based methods is presented. These second generation techniques make use of contour-texture modeling, new results in neurophysiology and psychophysics and scene analysis.

  19. Coding sequence density estimation via topological pressure.

    PubMed

    Koslicki, David; Thompson, Daniel J

    2015-01-01

    We give a new approach to coding sequence (CDS) density estimation in genomic analysis based on the topological pressure, which we develop from a well known concept in ergodic theory. Topological pressure measures the 'weighted information content' of a finite word, and incorporates 64 parameters which can be interpreted as a choice of weight for each nucleotide triplet. We train the parameters so that the topological pressure fits the observed coding sequence density on the human genome, and use this to give ab initio predictions of CDS density over windows of size around 66,000 bp on the genomes of Mus Musculus, Rhesus Macaque and Drososphilia Melanogaster. While the differences between these genomes are too great to expect that training on the human genome could predict, for example, the exact locations of genes, we demonstrate that our method gives reasonable estimates for the 'coarse scale' problem of predicting CDS density. Inspired again by ergodic theory, the weightings of the nucleotide triplets obtained from our training procedure are used to define a probability distribution on finite sequences, which can be used to distinguish between intron and exon sequences from the human genome of lengths between 750 and 5,000 bp. At the end of the paper, we explain the theoretical underpinning for our approach, which is the theory of Thermodynamic Formalism from the dynamical systems literature. Mathematica and MATLAB implementations of our method are available at http://sourceforge.net/projects/topologicalpres/ . PMID:24448658

  20. Composition for nucleic acid sequencing

    SciTech Connect

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2008-08-26

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  1. The Cipher Code of Simple Sequence Repeats in "Vampire Pathogens".

    PubMed

    Zou, Geng; Bello-Orti, Bernardo; Aragon, Virginia; Tucker, Alexander W; Luo, Rui; Ren, Pinxing; Bi, Dingren; Zhou, Rui; Jin, Hui

    2015-07-28

    Blood inside mammals is a forbidden area for the majority of prokaryotic microbes; however, red blood cells tropism microbes, like "vampire pathogens" (VP), succeed in matching scarce nutrients and surviving strong immunity reactions. Here, we found VP of Mycoplasma, Rhizobiales, and Rickettsiales showed significantly higher counts of (AG)n dimeric simple sequence repeats (Di-SSRs) in the genomes, coding and non-coding regions than non Vampire Pathogens (N_VP). Regression analysis indicated a significant correlation between GC content and the span of (AG)n-Di-SSR variation. Gene Ontology (GO) terms with abundance of (AG)3-Di-SSRs shared by the VP strains were associated with purine nucleotide metabolism (FDR < 0.01), indicating an adaptation to the limited availability of purine and nucleotide precursors in blood. Di-amino acids coded by (AG)n-Di-SSRs included all three six-fold code amino acids (Arg, Leu and Ser) and significantly higher counts of Di-amino acids coded by (AG)3, (GA)3, and (TC)3 in VP than N_VP. Furthermore, significant differences (P < 0.001) on the numbers of triplexes formed from (AG)n-Di-SSRs between VP and N_VP in Mycoplasma suggested the potential role of (AG)n-Di-SSRs in gene regulation.

  2. Whole Genome Sequencing: Cracking the Genetic Code for Foodborne Illness

    MedlinePlus

    ... For Consumers Home For Consumers Consumer Updates Whole Genome Sequencing: Cracking the Genetic Code for Foodborne Illness ... Bacteria that cause disease have millions of different genomes, or sequences of genetic code, each as unique ...

  3. Sorbitol dehydrogenase. Full-length cDNA sequencing reveals a mRNA coding for a protein containing an additional 42 amino acids at the N-terminal end.

    PubMed

    Wen, Y; Bekhor, I

    1993-10-01

    A cDNA clone encoding rat sorbitol dehydrogenase (SDH) was isolated from a rat testis lambda ZAP II cDNA library. The full-length cDNA insert contained 2277 base pairs (bp), starting 182 bp upstream from an ATG codon where translation to the active enzyme SDH is presumed to be initiated. A second ATG codon, however, was found 126 bp upstream, aligned in the same reading frame as that of the active enzyme. Therefore, the coding sequence for SDH can be translated into an additional 42-amino-acid polypeptide linked to the N-terminal amino acid of the enzyme, generating a pre-sorbitol dehydrogenase. The sequence data indicate that the nucleotide environment around this ATG codon is more favorable towards it being the actual open reading frame (ORF) for a pre-SDH than the ATG codon preceding the nucleotide sequence for SDH. Since no known SDH starts with the additional 42 amino acids, it may be that post-translational removal of this polypeptide accompanies the release of the active enzyme. Next, the 3' untranslated region of the cDNA contained a non-coding 1021 bp downstream from the TAA stop codon. The latter sequence included three putative poly(A) signals: one at nucleotides 1362-1367, the second at nucleotides 1465-1470, and the third at nucleotides 2212-2217 [17 bp away from the poly(A) tail]. In addition to the above findings we also report a variance in one of the amino acids in the SDH cDNA sequence. This variance occurs at position 957-960, where threonine is coded for instead of aspartic acid; in the rat testis SDH cDNA, we find the sequence is ACG instead of GAC, as was reported for the rat liver SDH cDNA. Northern-blot hybridization analysis showed that SDH mRNA is a doublet, one band of 4 kb and the other of 2.3-2.4 kb, in both the rat liver and the rat lens, further confirming that the isolated SDH cDNA constituted a full-length cDNA.

  4. High speed nucleic acid sequencing

    SciTech Connect

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2011-05-17

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid. Each type of labeled nucleotide comprises an acceptor fluorophore attached to a phosphate portion of the nucleotide such that the fluorophore is removed upon incorporation into a growing strand. Fluorescent signal is emitted via fluorescent resonance energy transfer between the donor fluorophore and the acceptor fluorophore as each nucleotide is incorporated into the growing strand. The sequence is deduced by identifying which base is being incorporated into the growing strand.

  5. A downstream regulatory element located within the coding sequence mediates autoregulated expression of the yeast fatty acid synthase gene FAS2 by the FAS1 gene product.

    PubMed

    Wenz, P; Schwank, S; Hoja, U; Schüller, H J

    2001-11-15

    The fatty acid synthase genes FAS1 and FAS2 of the yeast Saccharomyces cerevisiae are transcriptionally co-regulated by general transcription factors (such as Reb1, Rap1 and Abf1) and by the phospholipid-specific heterodimeric activator Ino2/Ino4, acting via their corresponding upstream binding sites. Here we provide evidence for a positive autoregulatory influence of FAS1 on FAS2 expression. Even with a constant FAS2 copy number, a 10-fold increase of FAS2 transcript amount was observed in the presence of FAS1 in multi-copy, compared to a fas1 null mutant. Surprisingly, the first 66 nt of the FAS2 coding region turned out as necessary and sufficient for FAS1-dependent gene expression. FAS2-lacZ fusion constructs deleted for this region showed high reporter gene expression even in the absence of FAS1, arguing for a negatively-acting downstream repression site (DRS) responsible for FAS1-dependent expression of FAS2. Our data suggest that the FAS1 gene product, in addition to its catalytic function, is also required for the coordinate biosynthetic control of the yeast FAS complex. An excess of uncomplexed Fas1 may be responsible for the deactivation of an FAS2-specific repressor, acting via the DRS. PMID:11713312

  6. SEQassembly: A Practical Tools Program for Coding Sequences Splicing

    NASA Astrophysics Data System (ADS)

    Lee, Hongbin; Yang, Hang; Fu, Lei; Qin, Long; Li, Huili; He, Feng; Wang, Bo; Wu, Xiaoming

    CDS (Coding Sequences) is a portion of mRNA sequences, which are composed by a number of exon sequence segments. The construction of CDS sequence is important for profound genetic analysis such as genotyping. A program in MATLAB environment is presented, which can process batch of samples sequences into code segments under the guide of reference exon models, and splice these code segments of same sample source into CDS according to the exon order in queue file. This program is useful in transcriptional polymorphism detection and gene function study.

  7. Revisiting the Physico-Chemical Hypothesis of Code Origin: An Analysis Based on Code-Sequence Coevolution in a Finite Population

    NASA Astrophysics Data System (ADS)

    Bandhu, Ashutosh Vishwa; Aggarwal, Neha; Sengupta, Supratim

    2013-12-01

    The origin of the genetic code marked a major transition from a plausible RNA world to the world of DNA and proteins and is an important milestone in our understanding of the origin of life. We examine the efficacy of the physico-chemical hypothesis of code origin by carrying out simulations of code-sequence coevolution in finite populations in stages, leading first to the emergence of ten amino acid code(s) and subsequently to 14 amino acid code(s). We explore two different scenarios of primordial code evolution. In one scenario, competition occurs between populations of equilibrated code-sequence sets while in another scenario; new codes compete with existing codes as they are gradually introduced into the population with a finite probability. In either case, we find that natural selection between competing codes distinguished by differences in the degree of physico-chemical optimization is unable to explain the structure of the standard genetic code. The code whose structure is most consistent with the standard genetic code is often not among the codes that have a high fixation probability. However, we find that the composition of the code population affects the code fixation probability. A physico-chemically optimized code gets fixed with a significantly higher probability if it competes against a set of randomly generated codes. Our results suggest that physico-chemical optimization may not be the sole driving force in ensuring the emergence of the standard genetic code.

  8. MACSE: Multiple Alignment of Coding SEquences Accounting for Frameshifts and Stop Codons

    PubMed Central

    Ranwez, Vincent; Harispe, Sébastien; Delsuc, Frédéric; Douzery, Emmanuel J. P.

    2011-01-01

    Until now the most efficient solution to align nucleotide sequences containing open reading frames was to use indirect procedures that align amino acid translation before reporting the inferred gap positions at the codon level. There are two important pitfalls with this approach. Firstly, any premature stop codon impedes using such a strategy. Secondly, each sequence is translated with the same reading frame from beginning to end, so that the presence of a single additional nucleotide leads to both aberrant translation and alignment. We present an algorithm that has the same space and time complexity as the classical Needleman-Wunsch algorithm while accommodating sequencing errors and other biological deviations from the coding frame. The resulting pairwise coding sequence alignment method was extended to a multiple sequence alignment (MSA) algorithm implemented in a program called MACSE (Multiple Alignment of Coding SEquences accounting for frameshifts and stop codons). MACSE is the first automatic solution to align protein-coding gene datasets containing non-functional sequences (pseudogenes) without disrupting the underlying codon structure. It has also proved useful in detecting undocumented frameshifts in public database sequences and in aligning next-generation sequencing reads/contigs against a reference coding sequence. MACSE is distributed as an open-source java file executable with freely available source code and can be used via a web interface at: http://mbb.univ-montp2.fr/macse. PMID:21949676

  9. Amino acid fermentation at the origin of the genetic code

    PubMed Central

    2012-01-01

    There is evidence that the genetic code was established prior to the existence of proteins, when metabolism was powered by ribozymes. Also, early proto-organisms had to rely on simple anaerobic bioenergetic processes. In this work I propose that amino acid fermentation powered metabolism in the RNA world, and that this was facilitated by proto-adapters, the precursors of the tRNAs. Amino acids were used as carbon sources rather than as catalytic or structural elements. In modern bacteria, amino acid fermentation is known as the Stickland reaction. This pathway involves two amino acids: the first undergoes oxidative deamination, and the second acts as an electron acceptor through reductive deamination. This redox reaction results in two keto acids that are employed to synthesise ATP via substrate-level phosphorylation. The Stickland reaction is the basic bioenergetic pathway of some bacteria of the genus Clostridium. Two other facts support Stickland fermentation in the RNA world. First, several Stickland amino acid pairs are synthesised in abiotic amino acid synthesis. This suggests that amino acids that could be used as an energy substrate were freely available. Second, anticodons that have complementary sequences often correspond to amino acids that form Stickland pairs. The main hypothesis of this paper is that pairs of complementary proto-adapters were assigned to Stickland amino acids pairs. There are signatures of this hypothesis in the genetic code. Furthermore, it is argued that the proto-adapters formed double strands that brought amino acid pairs into proximity to facilitate their mutual redox reaction, structurally constraining the anticodon pairs that are assigned to these amino acid pairs. Significance tests which randomise the code are performed to study the extent of the variability of the energetic (ATP) yield. Random assignments can lead to a substantial yield of ATP and maintain enough variability, thus selection can act and refine the assignments

  10. Variation in Seed Fatty Acid Composition, and Sequence Divergence in the FAD2 Gene Coding Region between Wild and Cultivated Sesame

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Sesame germplasm harbors genetic diversity which can be useful for sesame improvement in breeding programs. Seven accessions with different levels of oleic acid were selected from the entire USDA sesame germplasm collection (1232 accessions) and planted for morphological observation and re-examinati...

  11. Correlation approach to identify coding regions in DNA sequences

    NASA Technical Reports Server (NTRS)

    Ossadnik, S. M.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1994-01-01

    Recently, it was observed that noncoding regions of DNA sequences possess long-range power-law correlations, whereas coding regions typically display only short-range correlations. We develop an algorithm based on this finding that enables investigators to perform a statistical analysis on long DNA sequences to locate possible coding regions. The algorithm is particularly successful in predicting the location of lengthy coding regions. For example, for the complete genome of yeast chromosome III (315,344 nucleotides), at least 82% of the predictions correspond to putative coding regions; the algorithm correctly identified all coding regions larger than 3000 nucleotides, 92% of coding regions between 2000 and 3000 nucleotides long, and 79% of coding regions between 1000 and 2000 nucleotides. The predictive ability of this new algorithm supports the claim that there is a fundamental difference in the correlation property between coding and noncoding sequences. This algorithm, which is not species-dependent, can be implemented with other techniques for rapidly and accurately locating relatively long coding regions in genomic sequences.

  12. FRAGS: estimation of coding sequence substitution rates from fragmentary data

    PubMed Central

    Swart, Estienne C; Hide, Winston A; Seoighe, Cathal

    2004-01-01

    Background Rates of substitution in protein-coding sequences can provide important insights into evolutionary processes that are of biomedical and theoretical interest. Increased availability of coding sequence data has enabled researchers to estimate more accurately the coding sequence divergence of pairs of organisms. However the use of different data sources, alignment protocols and methods to estimate substitution rates leads to widely varying estimates of key parameters that define the coding sequence divergence of orthologous genes. Although complete genome sequence data are not available for all organisms, fragmentary sequence data can provide accurate estimates of substitution rates provided that an appropriate and consistent methodology is used and that differences in the estimates obtainable from different data sources are taken into account. Results We have developed FRAGS, an application framework that uses existing, freely available software components to construct in-frame alignments and estimate coding substitution rates from fragmentary sequence data. Coding sequence substitution estimates for human and chimpanzee sequences, generated by FRAGS, reveal that methodological differences can give rise to significantly different estimates of important substitution parameters. The estimated substitution rates were also used to infer upper-bounds on the amount of sequencing error in the datasets that we have analysed. Conclusion We have developed a system that performs robust estimation of substitution rates for orthologous sequences from a pair of organisms. Our system can be used when fragmentary genomic or transcript data is available from one of the organisms and the other is a completely sequenced genome within the Ensembl database. As well as estimating substitution statistics our system enables the user to manage and query alignment and substitution data. PMID:15005802

  13. Evolution from primordial oligomeric repeats to modern coding sequences.

    PubMed

    Ohno, S

    1987-01-01

    It seems as though nature was most innovative at the very beginning of life on this Earth a few billion years ago. For example, the functional competence of most, if not all, of the sugar-metabolizing enzymes was clearly established before the division of eukaryotes from prokaryotes eons ago, each critical active-site amino acid sequence being conserved ever since by bacteria as well as by mammals. I contend that this initial innovativeness was due to the first set of coding sequences being repeats of base oligomers, thus encoding polypeptide chains of various periodicities; such periodical polypeptide chains can easily acquire alpha-helical and beta-sheet-forming segments. In fact, the entire length of sugar-metabolizing enzymes is comprised of alternating alpha-helical and beta-sheet-forming segments. In the prebiotic (therefore nonenzymatic) replication of nucleic acids, what was in short supply was long templates, for there apparently was no inherent obstacle in copying of long templates, if such existed, in the presence of Zn2+. I submit that in this prebiotic condition, only those nucleotide oligomers that were internal doubles were automatically assured of progressive elongation to become long templates. For example, a decamer that was a pentameric repeat and its complementary sequence may pair unequally to initiate the next round of replication: first unit pairing with second, and a paired segment serving as a primer. As a consequence of this unequal pairing, decameric templates managed to become pentadecameric templates only after one round of replication, and this elongation process had no inherent limit.

  14. Coding sequences of functioning human genes derived entirely from mobile element sequences.

    PubMed

    Britten, Roy J

    2004-11-30

    Among all of the many examples of mobile elements or "parasitic sequences" that affect the function of the human genome, this paper describes several examples of functioning genes whose sequences have been almost completely derived from mobile elements. There are many examples where the synthetic coding sequences of observed mRNA sequences are made up of mobile element sequences, to an extent of 80% or more of the length of the coding sequences. In the examples described here, the genes have named functions, and some of these functions have been studied. It appears that each of the functioning genes was originally formed from mobile elements and that in some process of molecular evolution a coding sequence was derived that could be translated into a protein that is of some importance to human biology. In one case (AD7C), the coding sequence is 99% made up of a cluster of Alu sequences. In another example, the gene BNIP3 coding sequence is 97% made up of sequences from an apparent human endogenous retrovirus. The Syncytin gene coding sequence appears to be made from an endogenous retrovirus envelope gene. PMID:15546984

  15. Sequence of the Ampullariella sp. strain 3876 gene coding for xylose isomerase.

    PubMed

    Saari, G C; Kumar, A A; Kawasaki, G H; Insley, M Y; O'Hara, P J

    1987-02-01

    The nucleotide sequence of the gene coding for xylose isomerase from Ampullariella sp. strain 3876, a gram-positive bacterium, has been determined. A clone of a fragment of strain 3876 DNA coding for a xylose isomerase activity was identified by its ability to complement a xylose isomerase-defective Escherichia coli strain. One such complementation positive fragment, 2,922 nucleotides in length, was sequenced in its entirety. There are two open reading frames 1,182 and 1,242 nucleotides in length, on opposite strands of this fragment, each of which could code for a protein the expected size of xylose isomerase. The 1,182-nucleotide open reading frame was identified as the coding sequence for the protein from the sequence analysis of the amino-terminal region and selected internal peptides. The gene initiates with GTG and has a high guanine and cytosine content (70%) and an exceptionally strong preference (97%) for guanine or cytosine in the third position of the codons. The gene codes for a 43,210-dalton polypeptide composed of 393 amino acids. The xylose isomerase from Ampullariella sp. strain 3876 is similar in size to other bacterial xylose isomerases and has limited amino acid sequence homology to the available sequences from E. coli, Bacillus subtilis, and Streptomyces violaceus-ruber. In all cases yet studied, the bacterial gene for xylulose kinase is downstream from the gene for xylose isomerase. We present evidence suggesting that in Ampullariella sp. strain 3876 these genes are similarly arranged. PMID:3027039

  16. Sequence of the Ampullariella sp. strain 3876 gene coding for xylose isomerase.

    PubMed Central

    Saari, G C; Kumar, A A; Kawasaki, G H; Insley, M Y; O'Hara, P J

    1987-01-01

    The nucleotide sequence of the gene coding for xylose isomerase from Ampullariella sp. strain 3876, a gram-positive bacterium, has been determined. A clone of a fragment of strain 3876 DNA coding for a xylose isomerase activity was identified by its ability to complement a xylose isomerase-defective Escherichia coli strain. One such complementation positive fragment, 2,922 nucleotides in length, was sequenced in its entirety. There are two open reading frames 1,182 and 1,242 nucleotides in length, on opposite strands of this fragment, each of which could code for a protein the expected size of xylose isomerase. The 1,182-nucleotide open reading frame was identified as the coding sequence for the protein from the sequence analysis of the amino-terminal region and selected internal peptides. The gene initiates with GTG and has a high guanine and cytosine content (70%) and an exceptionally strong preference (97%) for guanine or cytosine in the third position of the codons. The gene codes for a 43,210-dalton polypeptide composed of 393 amino acids. The xylose isomerase from Ampullariella sp. strain 3876 is similar in size to other bacterial xylose isomerases and has limited amino acid sequence homology to the available sequences from E. coli, Bacillus subtilis, and Streptomyces violaceus-ruber. In all cases yet studied, the bacterial gene for xylulose kinase is downstream from the gene for xylose isomerase. We present evidence suggesting that in Ampullariella sp. strain 3876 these genes are similarly arranged. PMID:3027039

  17. The Coding and Inter-Manual Transfer of Movement Sequences

    PubMed Central

    Shea, Charles H.; Kovacs, Attila J.; Panzer, Stefan

    2011-01-01

    The manuscript reviews recent experiments that use inter-manual transfer and inter-manual practice paradigms to determine the coordinate system (visual–spatial or motor) used in the coding of movement sequences during physical and observational practice. The results indicated that multi-element movement sequences are more effectively coded in visual–spatial coordinates even following extended practice, while very early in practice movement sequences with only a few movement elements and relatively short durations are coded in motor coordinates. Likewise, inter-manual practice of relatively simple movement sequences show benefits of right and left limb practice that involves the same motor coordinates while the opposite is true for more complex sequences. The results suggest that the coordinate system used to code the sequence information is linked to both the task characteristics and the control processes used to produce the sequence. These findings have the potential to greatly enhance our understanding of why in some conditions participants following practice with one limb or observation of one limb practice can effectively perform the task with the contralateral limb while in other (often similar) conditions cannot. PMID:21716583

  18. Chip-based sequencing nucleic acids

    SciTech Connect

    Beer, Neil Reginald

    2014-08-26

    A system for fast DNA sequencing by amplification of genetic material within microreactors, denaturing, demulsifying, and then sequencing the material, while retaining it in a PCR/sequencing zone by a magnetic field. One embodiment includes sequencing nucleic acids on a microchip that includes a microchannel flow channel in the microchip. The nucleic acids are isolated and hybridized to magnetic nanoparticles or to magnetic polystyrene-coated beads. Microreactor droplets are formed in the microchannel flow channel. The microreactor droplets containing the nucleic acids and the magnetic nanoparticles are retained in a magnetic trap in the microchannel flow channel and sequenced.

  19. Algebraic solution of the synthesis problem for coded sequences

    SciTech Connect

    Leukhin, Anatolii N

    2005-08-31

    The algebraic solution of a 'complex' problem of synthesis of phase-coded (PC) sequences with the zero level of side lobes of the cyclic autocorrelation function (ACF) is proposed. It is shown that the solution of the synthesis problem is connected with the existence of difference sets for a given code dimension. The problem of estimating the number of possible code combinations for a given code dimension is solved. It is pointed out that the problem of synthesis of PC sequences is related to the fundamental problems of discrete mathematics and, first of all, to a number of combinatorial problems, which can be solved, as the number factorisation problem, by algebraic methods by using the theory of Galois fields and groups. (fourth seminar to the memory of d.n. klyshko)

  20. Streamlined Genome Sequence Compression using Distributed Source Coding

    PubMed Central

    Wang, Shuang; Jiang, Xiaoqian; Chen, Feng; Cui, Lijuan; Cheng, Samuel

    2014-01-01

    We aim at developing a streamlined genome sequence compression algorithm to support alternative miniaturized sequencing devices, which have limited communication, storage, and computation power. Existing techniques that require heavy client (encoder side) cannot be applied. To tackle this challenge, we carefully examined distributed source coding theory and developed a customized reference-based genome compression protocol to meet the low-complexity need at the client side. Based on the variation between source and reference, our protocol will pick adaptively either syndrome coding or hash coding to compress subsequences of changing code length. Our experimental results showed promising performance of the proposed method when compared with the state-of-the-art algorithm (GRS). PMID:25520552

  1. ICDS database: interrupted CoDing sequences in prokaryotic genomes.

    PubMed

    Perrodou, Emmanuel; Deshayes, Caroline; Muller, Jean; Schaeffer, Christine; Van Dorsselaer, Alain; Ripp, Raymond; Poch, Olivier; Reyrat, Jean-Marc; Lecompte, Odile

    2006-01-01

    Unrecognized frameshifts, in-frame stop codons and sequencing errors lead to Interrupted CoDing Sequence (ICDS) that can seriously affect all subsequent steps of functional characterization, from in silico analysis to high-throughput proteomic projects. Here, we describe the Interrupted CoDing Sequence database containing ICDS detected by a similarity-based approach in 80 complete prokaryotic genomes. ICDS can be retrieved by species browsing or similarity searches via a web interface (http://www-bio3d-igbmc.u-strasbg.fr/ICDS/). The definition of each interrupted gene is provided as well as the ICDS genomic localization with the surrounding sequence. Furthermore, to facilitate the experimental characterization of ICDS, we propose optimized primers for re-sequencing purposes. The database will be regularly updated with additional data from ongoing sequenced genomes. Our strategy has been validated by three independent tests: (i) ICDS prediction on a benchmark of artificially created frameshifts, (ii) comparison of predicted ICDS and results obtained from the comparison of the two genomic sequences of Bacillus licheniformis strain ATCC 14580 and (iii) re-sequencing of 25 predicted ICDS of the recently sequenced genome of Mycobacterium smegmatis. This allows us to estimate the specificity and sensitivity (95 and 82%, respectively) of our program and the efficiency of primer determination.

  2. Distinguishing proteins from arbitrary amino acid sequences.

    PubMed

    Yau, Stephen S-T; Mao, Wei-Guang; Benson, Max; He, Rong Lucy

    2015-01-01

    What kinds of amino acid sequences could possibly be protein sequences? From all existing databases that we can find, known proteins are only a small fraction of all possible combinations of amino acids. Beginning with Sanger's first detailed determination of a protein sequence in 1952, previous studies have focused on describing the structure of existing protein sequences in order to construct the protein universe. No one, however, has developed a criteria for determining whether an arbitrary amino acid sequence can be a protein. Here we show that when the collection of arbitrary amino acid sequences is viewed in an appropriate geometric context, the protein sequences cluster together. This leads to a new computational test, described here, that has proved to be remarkably accurate at determining whether an arbitrary amino acid sequence can be a protein. Even more, if the results of this test indicate that the sequence can be a protein, and it is indeed a protein sequence, then its identity as a protein sequence is uniquely defined. We anticipate our computational test will be useful for those who are attempting to complete the job of discovering all proteins, or constructing the protein universe. PMID:25609314

  3. Genetic characterization of three novel chicken parvovirus strains based on analysis of their coding sequences.

    PubMed

    Koo, Bon-Sang; Lee, Hae-Rim; Jeon, Eun-Ok; Han, Moo-Sung; Min, Kyeong-Cheol; Lee, Seung-Baek; Bae, Yeon-Ji; Cho, Sun-Hyung; Mo, Jong-Suk; Kwon, Hyuk Moo; Sung, Haan Woo; Kim, Jong-Nyeo; Mo, In-Pil

    2015-01-01

    Chicken parvovirus (ChPV) is one of the causative agents of viral enteritis. Recently, the genome of the ABU-P1 strain of ChPV was fully sequenced and determined to have a distinct genomic composition compared with that of vertebrate parvoviruses. However, no comparative sequence analysis of coding regions of ChPVs was possible because of the lack of other sequence information. In this study, we obtained the nucleotide sequences of all genomic coding regions of three ChPVs by polymerase chain reaction using 13 primer sets, and deduced the amino acid sequences from the nucleotide sequences. The non-structural protein 1 (NS1) gene of the three ChPVs showed 95.0 to 95.5% nucleotide sequence identity and 96.5 to 98.1% amino acid sequence identity to those of NS1 from the ABU-P1 strain, respectively, and even higher nucleotide and amino acid similarities to one another. The viral proteins (VP) gene was more divergent between the three ChPV Korean strains and ABU-P1, with 88.1 to 88.3% nucleotide identity and 93.0% amino acid identity. Analysis of the putative tertiary structure of the ChPV VP2 protein showed that variable regions with less than 80% nucleotide similarity between the three Korean strains and ABU-P1 occurred in large loops of the VP2 protein believed to be involved in antigenicity, pathogenicity, and tissue tropism in other parvoviruses. Based on our analysis of full-length coding sequences, we discovered greater variation in ChPV strains than reported previously, especially in partial regions of the VP2 protein.

  4. Method for sequencing nucleic acid molecules

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2006-06-06

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  5. Method for sequencing nucleic acid molecules

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2006-05-30

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  6. The primordial sequence, ribosomes, and the genetic code.

    NASA Technical Reports Server (NTRS)

    Fox, S. W.; Yuki, A.; Waehneldt, T. V.; Lacey, J. C., Jr.

    1971-01-01

    Experimental investigation of the key question of the origin of life concerning the chronological order in the primordial sequence of nucleic acid, protein, and cell. It is pointed out that, when viewed against the background of experiments on the selective reaction of basic homopolyamine acids with mononucleotides (Lacey and Pruitt, 1969; Woese, 1968), the experiments made help to establish a basis for understanding how information originally flowed from proteins to nucleic acids.

  7. Radio frequency interference effect on PN code sequence lock detector

    NASA Technical Reports Server (NTRS)

    Kwon, Hyuck M.; Tu, Kwei; Loh, Y. C.

    1991-01-01

    The authors find the probabilities of detection and false alarm of the pseudonoise (PN) sequence code lock detector when strong radio frequency interference (RFI) hits the communications link. Both a linear model and a soft-limiter nonlinear model for a transponder receiver are considered. In addition, both continuous wave (CW) RFI and pulse RFI are analyzed, and a discussion is included of how strong CW RFI can knock out the PN code lock detector in a linear or a soft-limiter transponder. As an example, the Space Station Freedom forward S-band PN system is evaluated. It is shown that a soft-limiter transponder can protect the PN code lock detector against a typical pulse RFI, but it can degrade the PN code lock detector performance more than a linear transponder if CW RFI hits the link.

  8. Nonlinear Aspects of Coding and Noncoding DNA Sequences

    NASA Astrophysics Data System (ADS)

    Stanley, H. Eugene

    2001-03-01

    One of the most remarkable features of human DNA is that 97 percent is not coding for proteins. Studying this noncoding DNA is important both for practical reasons (to distinguish it from the coding DNA as the human genome is sequenced), and for scientific reasons (why is the noncoding DNA present at all, if it appears to have little if any purpose?). In this talk we discuss new methods of analyzing coding and noncoding DNA in parallel, with a view to uncovering different statistical properties of the two kinds of DNA. We also speculate on possible roles of noncoding DNA. The work reported here was carried out primarily by P. Bernaola-Galvan, S. V. Buldyrev, P. Carpena, N. Dokholyan, A. L. Goldberger, I. Grosse, S. Havlin, H. Herzel, J. L. Oliver, C.-K. Peng, M. Simons, H. E. Stanley, R. H. R. Stanley, and G. M. Viswanathan. [1] For a brief overview in language that physicists can understand, see H. E. Stanley, S. V. Buldyrev, A. L. Goldberger, S. Havlin, C.-K. Peng, and M. Simons, "Scaling Features of Noncoding DNA" [Proc. XII Max Born Symposium, Wroclaw], Physica A 273, 1-18 (1999). [2] I. Grosse, H. Herzel, S. V. Buldyrev, and H. E. Stanley, "Species Independence of Mutual Information in Coding and Noncoding DNA," Phys. Rev. E 61, 5624-5629 (2000). [3] P. Bernaola-Galvan, I. Grosse, P. Carpena, J. L. Oliver, and H. E. Stanley, "Identification of DNA Coding Regions Using an Entropic Segmentation Method," Phys. Rev. Lett. 84, 1342-1345 (2000). [4] N. Dokholyan, S. V. Buldyrev, S. Havlin, and H. E. Stanley, "Distributions of Dimeric Tandem Repeats in Non-coding and Coding DNA Sequences," J. Theor. Biol. 202, 273-282 (2000). [5] R. H. R. Stanley, N. V. Dokholyan, S. V. Buldyrev, S. Havlin, and H. E. Stanley, "Clumping of Identical Oligonucleotides in Coding and Noncoding DNA Sequences," J. Biomol. Structure and Design 17, 79-87 (1999). [6] N. Dokholyan, S. V. Buldyrev, S. Havlin, and H. E. Stanley, "Distribution of Base Pair Repeats in Coding and Noncoding DNA

  9. Hiding message into DNA sequence through DNA coding and chaotic maps.

    PubMed

    Liu, Guoyan; Liu, Hongjun; Kadir, Abdurahman

    2014-09-01

    The paper proposes an improved reversible substitution method to hide data into deoxyribonucleic acid (DNA) sequence, and four measures have been taken to enhance the robustness and enlarge the hiding capacity, such as encode the secret message by DNA coding, encrypt it by pseudo-random sequence, generate the relative hiding locations by piecewise linear chaotic map, and embed the encoded and encrypted message into a randomly selected DNA sequence using the complementary rule. The key space and the hiding capacity are analyzed. Experimental results indicate that the proposed method has a better performance compared with the competing methods with respect to robustness and capacity. PMID:25023893

  10. Bovine Parathyroid Hormone: Amino Acid Sequence

    PubMed Central

    Brewer, H. Bryan; Ronan, Rosemary

    1970-01-01

    Bovine parathyroid hormone has been isolated in homogeneous form, and its complete amino acid sequence determined. The bovine hormone is a single chain, 84 amino acids long. It contains amino-terminal alanine, and carboxyl-terminal glutamine. The bovine parathyroid hormone is approximately three times the length of the newly discovered hormone, thyrocalcitonin, whose action is reciprocal to parathyroid hormone. Images PMID:5275384

  11. Sequence and Structural Analyses for Functional Non-coding RNAs

    NASA Astrophysics Data System (ADS)

    Sakakibara, Yasubumi; Sato, Kengo

    Analysis and detection of functional RNAs are currently important topics in both molecular biology and bioinformatics research. Several computational methods based on stochastic context-free grammars (SCFGs) have been developed for modeling and analysing functional RNA sequences. These grammatical methods have succeeded in modeling typical secondary structures of RNAs and are used for structural alignments of RNA sequences. Such stochastic models, however, are not sufficient to discriminate member sequences of an RNA family from non-members, and hence to detect non-coding RNA regions from genome sequences. Recently, the support vector machine (SVM) and kernel function techniques have been actively studied and proposed as a solution to various problems in bioinformatics. SVMs are trained from positive and negative samples and have strong, accurate discrimination abilities, and hence are more appropriate for the discrimination tasks. A few kernel functions that extend the string kernel to measure the similarity of two RNA sequences from the viewpoint of secondary structures have been proposed. In this article, we give an overview of recent progress in SCFG-based methods for RNA sequence analysis and novel kernel functions tailored to measure the similarity of two RNA sequences and developed for use with support vector machines (SVM) in discriminating members of an RNA family from non-members.

  12. Multifractal detrended cross-correlation analysis of coding and non-coding DNA sequences through chaos-game representation

    NASA Astrophysics Data System (ADS)

    Pal, Mayukha; Satish, B.; Srinivas, K.; Rao, P. Madhusudana; Manimaran, P.

    2015-10-01

    We propose a new approach combining the chaos game representation and the two dimensional multifractal detrended cross correlation analysis methods to examine multifractal behavior in power law cross correlation between any pair of nucleotide sequences of unequal lengths. In this work, we analyzed the characteristic behavior of coding and non-coding DNA sequences of eight prokaryotes. The results show the presence of strong multifractal nature between coding and non-coding sequences of all data sets. We found that this integrative approach helps us to consider complete DNA sequences for characterization, and further it may be useful for classification, clustering, identification of class affiliation of nucleotide sequences etc. with high precision.

  13. On combining protein sequences and nucleic acid sequences in phylogenetic analysis: the homeobox protein case.

    PubMed

    Agosti, D; Jacobs, D; DeSalle, R

    1996-01-01

    Amino acid encoding genes contain character state information that may be useful for phylogenetic analysis on at least two levels. The nucleotide sequence and the translated amino acid sequences have both been employed separately as character states for cladistic studies of various taxa, including studies of the genealogy of genes in multigene families. In essence, amino acid sequences and nucleic acid sequences are two different ways of character coding the information in a gene. Silent positions in the nucleotide sequence (first or third positions in codons that can accrue change without changing the identity of the amino acid that the triplet codes for) may accrue change relatively rapidly and become saturated, losing the pattern of historical divergence. On the other hand, non-silent nucleotide alterations and their accompanying amino acid changes may evolve too slowly to reveal relationships among closely related taxa. In general, the dynamics of sequence change in silent and non-silent positions in protein coding genes result in homoplasy and lack of resolution, respectively. We suggest that the combination of nucleic acid and the translated amino acid coded character states into the same data matrix for phylogenetic analysis addresses some of the problems caused by the rapid change of silent nucleotide positions and overall slow rate of change of non-silent nucleotide positions and slowly changing amino acid positions. One major theoretical problem with this approach is the apparent non-independence of the two sources of characters. However, there are at least three possible outcomes when comparing protein coding nucleic acid sequences with their translated amino acids in a phylogenetic context on a codon by codon basis. First, the two character sets for a codon may be entirely congruent with respect to the information they convey about the relationships of a certain set of taxa. Second, one character set may display no information concerning a phylogenetic

  14. Complete coding sequences of the rabbitpox virus genome.

    PubMed

    Li, G; Chen, N; Roper, R L; Feng, Z; Hunter, A; Danila, M; Lefkowitz, E J; Buller, R M L; Upton, C

    2005-11-01

    Rabbitpox virus (RPXV) is highly virulent for rabbits and it has long been suspected to be a close relative of vaccinia virus. To explore these questions, the complete coding region of the rabbitpox virus genome was sequenced to permit comparison with sequenced strains of vaccinia virus and other orthopoxviruses. The genome of RPXV strain Utrecht (RPXV-UTR) is 197 731 nucleotides long, excluding the terminal hairpin structures at each end of the genome. The RPXV-UTR genome has 66.5 % A + T content, 184 putative functional genes and 12 fragmented ORF regions that are intact in other orthopoxviruses. The sequence of the RPXV-UTR genome reveals that two RPXV-UTR genes have orthologues in variola virus (VARV; the causative agent of smallpox), but not in vaccinia virus (VACV) strains. These genes are a zinc RING finger protein gene (RPXV-UTR-008) and an ankyrin repeat family protein gene (RPXV-UTR-180). A third gene, encoding a chemokine-binding protein (RPXV-UTR-001/184), is complete in VARV but functional only in some VACV strains. Examination of the evolutionary relationship between RPXV and other orthopoxviruses was carried out using the central 143 kb DNA sequence conserved among all completely sequenced orthopoxviruses and also the protein sequences of 49 gene products present in all completely sequenced chordopoxviruses. The results of these analyses both confirm that RPXV-UTR is most closely related to VACV and suggest that RPXV has not evolved directly from any of the sequenced VACV strains, since RPXV contains a 719 bp region not previously identified in any VACV.

  15. Ribosomal profiling adds new coding sequences to the proteome.

    PubMed

    Mumtaz, Muhammad Ali S; Couso, Juan Pablo

    2015-12-01

    Next generation sequencing (NGS) has enabled an in-depth look into genes, transcripts and their translation at the genomic scale. The application of NGS sequencing of ribosome footprints (Ribo-Seq) reveals translation with single nucleotide (nt) resolution, through the deep sequencing of ribosome-bound fragments (RBFs). Some results of Ribo-Seq challenge our understanding of the protein-coding potential of the genome. Earlier bioinformatic approaches had shown the presence of hundreds of thousands of putative small ORFs (smORFs) in eukaryotic genomes, but they had been largely ignored due to their large numbers and difficulty in determining their translation and function. Ribo-Seq has revealed that hundreds of putative smORFs within previously assumed long non-coding RNAs (lncRNAs) and UTRs of canonical mRNAs are associated with ribosomes, appearing to be translated. Here we review some of the approaches used to define translation within Ribo-Seq experiments and the challenges in defining translation of these novel smORFs in lncRNAs and UTRs. We also look at some of the bioinformatic and biochemical approaches used to independently corroborate these exciting new findings and elucidate real translation events.

  16. [Evolution of non-coding nucleotide sequences in Newcastle disease virus genomes ].

    PubMed

    Xu, Huaiying; Qin, Zhuoming; Qi, Lihong; Zhang, Wei; Wang, Youling; Liu, Jinhua

    2014-09-01

    [OBJECTIVE] Although much is done in the coding genes of Newcastle disease virus (NDV) , limited papers can be found with non-coding sequences. In this paper, the evolution tendency of non-coding sequences was studied. [METHODS] NDV strain LC12 isolated from duck with egg drop syndrome in 2012, and others 35 strains genome cDNA of different NDV genotype were sought and obtained from GenBank. Analytical approaches including nucleotide homology, nucleotide alignment and phylogenetic tree were associated with the leading sequences, trailer sequences, intergenic sequences (IGS), and coding gene between 5 'and 3' UTR nucleotide, respectively. [RESULTS] The location and the length of the non-coding sequences highly conserve, and the variation trend of non-coding sequences is synchronous with the entire genomes and coding genes. [ CONCLUSION] The molecular variation of the coding gene was indistinguishable with the non-coding gene in view of the NDV genome. PMID:25522596

  17. Licensee Event Report sequence coding and search procedure workshop

    SciTech Connect

    Cottrell, W.B.; Gallaher, R.B.

    1981-03-01

    Since mid-1980, the Office for Analysis and Evaluation of Operational Data (AEOD) of the Nuclear Regulatory Commission (NRC) has been developing procedures for the systematic review and analysis of Licensee Event Reports (LERs). These procedures generally address several areas of concern, including identification of significant trends and patterns, event sequence of occurrences, component failures, and system and plant effects. The AEOD and NSIC conducted a workshop on the new coding procedure at the American Museum of Science and Energy in Oak Ridge, TN, on November 24, 1980.

  18. Code-Time Diversity for Direct Sequence Spread Spectrum Systems

    PubMed Central

    Hassan, A. Y.

    2014-01-01

    Time diversity is achieved in direct sequence spread spectrum by receiving different faded delayed copies of the transmitted symbols from different uncorrelated channel paths when the transmission signal bandwidth is greater than the coherence bandwidth of the channel. In this paper, a new time diversity scheme is proposed for spread spectrum systems. It is called code-time diversity. In this new scheme, N spreading codes are used to transmit one data symbol over N successive symbols interval. The diversity order in the proposed scheme equals to the number of the used spreading codes N multiplied by the number of the uncorrelated paths of the channel L. The paper represents the transmitted signal model. Two demodulators structures will be proposed based on the received signal models from Rayleigh flat and frequency selective fading channels. Probability of error in the proposed diversity scheme is also calculated for the same two fading channels. Finally, simulation results are represented and compared with that of maximal ration combiner (MRC) and multiple-input and multiple-output (MIMO) systems. PMID:24982925

  19. Unnatural reactive amino acid genetic code additions

    DOEpatents

    Deiters, Alexander; Cropp, Ashton T; Chin, Jason W; Anderson, Christopher J; Schultz, Peter G

    2013-05-21

    This invention provides compositions and methods for producing translational components that expand the number of genetically encoded amino acids in eukaryotic cells. The components include orthogonal tRNAs, orthogonal aminoacyl-tRNA synthetases, pairs of tRNAs/synthetases and unnatural amino acids. Proteins and methods of producing proteins with unnatural amino acids in eukaryotic cells are also provided.

  20. Unnatural reactive amino acid genetic code additions

    DOEpatents

    Deiters, Alexander; Cropp, T. Ashton; Chin, Jason W.; Anderson, J. Christopher; Schultz, Peter G.

    2014-08-26

    This invention provides compositions and methods for producing translational components that expand the number of genetically encoded amino acids in eukaryotic cells. The components include orthogonal tRNAs, orthogonal aminoacyl-tRNA synthetases, orthogonal pairs of tRNAs/synthetases and unnatural amino acids. Proteins and methods of producing proteins with unnatural amino acids in eukaryotic cells are also provided.

  1. Unnatural reactive amino acid genetic code additions

    SciTech Connect

    Deiters, Alexander; Cropp, T. Ashton; Chin, Jason W.; Anderson, J. Christopher; Schultz, Peter G.

    2011-02-15

    This invention provides compositions and methods for producing translational components that expand the number of genetically encoded amino acids in eukaryotic cells. The components include orthogonal tRNAs, orthogonal aminoacyl-tRNA synthetases, orthogonal pairs of tRNAs/synthetases and unnatural amino acids. Proteins and methods of producing proteins with unnatural amino acids in eukaryotic cells are also provided.

  2. Unnatural reactive amino acid genetic code additions

    SciTech Connect

    Deiters, Alexander; Cropp, T. Ashton; Chin, Jason W.; Anderson, J. Christopher; Schultz, Peter G.

    2011-08-09

    This invention provides compositions and methods for producing translational components that expand the number of genetically encoded amino acids in eukaryotic cells. The components include orthogonal tRNAs, orthogonal aminoacyl-tRNAsyn-thetases, pairs of tRNAs/synthetases and unnatural amino acids. Proteins and methods of producing proteins with unnatural amino acids in eukaryotic cells are also provided.

  3. Method for identifying and quantifying nucleic acid sequence aberrations

    DOEpatents

    Lucas, J.N.; Straume, T.; Bogen, K.T.

    1998-07-21

    A method is disclosed for detecting nucleic acid sequence aberrations by detecting nucleic acid sequences having both a first and a second nucleic acid sequence type, the presence of the first and second sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. The method uses a first hybridization probe which includes a nucleic acid sequence that is complementary to a first sequence type and a first complexing agent capable of attaching to a second complexing agent and a second hybridization probe which includes a nucleic acid sequence that selectively hybridizes to the second nucleic acid sequence type over the first sequence type and includes a detectable marker for detecting the second hybridization probe. 11 figs.

  4. Method for identifying and quantifying nucleic acid sequence aberrations

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    1998-01-01

    A method for detecting nucleic acid sequence aberrations by detecting nucleic acid sequences having both a first and a second nucleic acid sequence type, the presence of the first and second sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. The method uses a first hybridization probe which includes a nucleic acid sequence that is complementary to a first sequence type and a first complexing agent capable of attaching to a second complexing agent and a second hybridization probe which includes a nucleic acid sequence that selectively hybridizes to the second nucleic acid sequence type over the first sequence type and includes a detectable marker for detecting the second hybridization probe.

  5. Cloning and nucleotide sequence of the gene coding for citrate synthase from a thermotolerant Bacillus sp.

    PubMed Central

    Schendel, F J; August, P R; Anderson, C R; Hanson, R S; Flickinger, M C

    1992-01-01

    The structural gene coding for citrate synthase from the gram-positive soil isolate Bacillus sp. strain C4 (ATCC 55182) capable of secreting acetic acid at pH 5.0 to 7.0 in the presence of dolime has been cloned from a genomic library by complementation of an Escherichia coli auxotrophic mutant lacking citrate synthase. The nucleotide sequence of the entire 3.1-kb HindIII fragment has been determined, and one major open reading frame was found coding for citrate synthase (ctsA). Citrate synthase from Bacillus sp. strain C4 was found to be a dimer (Mr, 84,500) with a subunit with an Mr of 42,000. The N-terminal sequence was found to be identical with that predicted from the gene sequence. The kinetics were best fit to a bisubstrate enzyme with an ordered mechanism. Bacillus sp. strain C4 citrate synthase was not activated by potassium chloride and was not inhibited by NADH, ATP, ADP, or AMP at levels up to 1 mM. The predicted amino acid sequence was compared with that of the E. coli, Acinetobacter anitratum, Pseudomonas aeruginosa, Rickettsia prowazekii, porcine heart, and Saccharomyces cerevisiae cytoplasmic and mitochondrial enzymes. PMID:1311544

  6. Cloning, sequencing, and heterologous expression of a gene coding for Arthromyces ramosus peroxidase.

    PubMed

    Sawai-Hatanaka, H; Ashikari, T; Tanaka, Y; Asada, Y; Nakayama, T; Minakata, H; Kunishima, N; Fukuyama, K; Yamada, H; Shibano, Y

    1995-07-01

    To understand the relationship between the structure and functions of the peroxidase of Arthromyces ramosus, a novel taxon of hyphomycete, and the evolutionary relationship of the A.ramosus peroxidase (ARP) with the other peroxidases, we isolated complementary and genomic DNA clones encoding ARP and characterized them. The sequence analyses of the ARP and cDNA coding for ARP showed that a mature ARP consists of 344 amino acids with a N-terminal pyroglutamic acid preceded by a signal peptide of 20 amino acid residues. The amino acid sequence of ARP was 99% identical to that of the peroxidase of Coprinus cinereus, a basidiomycete, and also had very high similarities (41-43% identity) to those of basidiomycetous lignin peroxidases, although we could find no lignin peroxidase activities for ARP when assayed with lignin model compounds. We could identified His184 and His56 as proximal and distal ligands to heme, respectively, and Arg52 as an essential Arg. Comparison of the sequences of complementary and genomic DNAs found that protein-encoding DNA is interrupted by 14 intervening sequences. The ARP cDNA was expressed in the yeast Saccharomyces cerevisiae under the promoter of the glyceraldehyde 3-phosphate dehydrogenase gene, yielding 0.02 units/ml of a secreted active peroxidase.

  7. Optimization of short amino acid sequences classifier

    NASA Astrophysics Data System (ADS)

    Barcz, Aleksy; Szymański, Zbigniew

    This article describes processing methods used for short amino acid sequences classification. The data processed are 9-symbols string representations of amino acid sequences, divided into 49 data sets - each one containing samples labeled as reacting or not with given enzyme. The goal of the classification is to determine for a single enzyme, whether an amino acid sequence would react with it or not. Each data set is processed separately. Feature selection is performed to reduce the number of dimensions for each data set. The method used for feature selection consists of two phases. During the first phase, significant positions are selected using Classification and Regression Trees. Afterwards, symbols appearing at the selected positions are substituted with numeric values of amino acid properties taken from the AAindex database. In the second phase the new set of features is reduced using a correlation-based ranking formula and Gram-Schmidt orthogonalization. Finally, the preprocessed data is used for training LS-SVM classifiers. SPDE, an evolutionary algorithm, is used to obtain optimal hyperparameters for the LS-SVM classifier, such as error penalty parameter C and kernel-specific hyperparameters. A simple score penalty is used to adapt the SPDE algorithm to the task of selecting classifiers with best performance measures values.

  8. Methods for analyzing nucleic acid sequences

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2011-05-17

    The present invention is directed to a method of sequencing a target nucleic acid. The method provides a complex comprising a polymerase enzyme, a target nucleic acid molecule, and a primer, wherein the complex is immobilized on a support Fluorescent label is attached to a terminal phosphate group of the nucleotide or nucleotide analog. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The time duration of the signal from labeled nucleotides or nucleotide analogs that become incorporated is distinguished from freely diffusing labels by a longer retention in the observation volume for the nucleotides or nucleotide analogs that become incorporated than for the freely diffusing labels.

  9. Nanopore Sequencing: Electrical Measurements of the Code of Life.

    PubMed

    Timp, Winston; Mirsaidov, Utkur M; Wang, Deqiang; Comer, Jeff; Aksimentiev, Aleksei; Timp, Gregory

    2010-05-01

    Sequencing a single molecule of deoxyribonucleic acid (DNA) using a nanopore is a revolutionary concept because it combines the potential for long read lengths (>5 kbp) with high speed (1 bp/10 ns), while obviating the need for costly amplification procedures due to the exquisite single molecule sensitivity. The prospects for implementing this concept seem bright. The cost savings from the removal of required reagents, coupled with the speed of nanopore sequencing places the $1000 genome within grasp. However, challenges remain: high fidelity reads demand stringent control over both the molecular configuration in the pore and the translocation kinetics. The molecular configuration determines how the ions passing through the pore come into contact with the nucleotides, while the translocation kinetics affect the time interval in which the same nucleotides are held in the constriction as the data is acquired. Proteins like α-hemolysin and its mutants offer exquisitely precise self-assembled nanopores and have demonstrated the facility for discriminating individual nucleotides, but it is currently difficult to design protein structure ab initio, which frustrates tailoring a pore for sequencing genomic DNA. Nanopores in solid-state membranes have been proposed as an alternative because of the flexibility in fabrication and ease of integration into a sequencing platform. Preliminary results have shown that with careful control of the dimensions of the pore and the shape of the electric field, control of DNA translocation through the pore is possible. Furthermore, discrimination between different base pairs of DNA may be feasible. Thus, a nanopore promises inexpensive, reliable, high-throughput sequencing, which could thrust genomic science into personal medicine.

  10. Properties of Sequence Conservation in Upstream Regulatory and Protein Coding Sequences among Paralogs in Arabidopsis thaliana

    NASA Astrophysics Data System (ADS)

    Richardson, Dale N.; Wiehe, Thomas

    Whole genome duplication (WGD) has catalyzed the formation of new species, genes with novel functions, altered expression patterns, complexified signaling pathways and has provided organisms a level of genetic robustness. We studied the long-term evolution and interrelationships of 5’ upstream regulatory sequences (URSs), protein coding sequences (CDSs) and expression correlations (EC) of duplicated gene pairs in Arabidopsis. Three distinct methods revealed significant evolutionary conservation between paralogous URSs and were highly correlated with microarray-based expression correlation of the respective gene pairs. Positional information on exact matches between sequences unveiled the contribution of micro-chromosomal rearrangements on expression divergence. A three-way rank analysis of URS similarity, CDS divergence and EC uncovered specific gene functional biases. Transcription factor activity was associated with gene pairs exhibiting conserved URSs and divergent CDSs, whereas a broad array of metabolic enzymes was found to be associated with gene pairs showing diverged URSs but conserved CDSs.

  11. The vicilin gene family of pea (Pisum sativum L.): a complete cDNA coding sequence for preprovicilin.

    PubMed Central

    Lycett, G W; Delauney, A J; Gatehouse, J A; Gilroy, J; Croy, R R; Boulter, D

    1983-01-01

    A cDNA plasmid bank has been constructed using mRNA from developing pea seeds and three cDNAs coding for vicilin polypeptides have been selected. These cDNAs have been sequenced and between them cover the whole of the coding sequence plus part of the 5' and 3' untranslated regions. Comparison with amino acid sequence data from the protein indicates that vicilin is synthesised as preprovicilin with subsequent removal of a signal peptide and a C-terminal peptide as well as post translational endo-proteolytic cleavage. The cDNAs represent two different classes of vicilin genes whilst amino acid data show that there are at least three major classes of vicilin polypeptide. The vicilin sequences show extensive homology with conglycinin and phaseolin except in the regions of the internal proteolytic cleavages. The evolutionary significance of this relationship is discussed. Images PMID:6687941

  12. The Signal Sequence Coding Region Promotes Nuclear Export of mRNA

    PubMed Central

    Palazzo, Alexander F; Springer, Michael; Shibata, Yoko; Lee, Chung-Sheng; Dias, Anusha P; Rapoport, Tom A

    2007-01-01

    In eukaryotic cells, most mRNAs are exported from the nucleus by the transcription export (TREX) complex, which is loaded onto mRNAs after their splicing and capping. We have studied in mammalian cells the nuclear export of mRNAs that code for secretory proteins, which are targeted to the endoplasmic reticulum membrane by hydrophobic signal sequences. The mRNAs were injected into the nucleus or synthesized from injected or transfected DNA, and their export was followed by fluorescent in situ hybridization. We made the surprising observation that the signal sequence coding region (SSCR) can serve as a nuclear export signal of an mRNA that lacks an intron or functional cap. Even the export of an intron-containing natural mRNA was enhanced by its SSCR. Like conventional export, the SSCR-dependent pathway required the factor TAP, but depletion of the TREX components had only moderate effects. The SSCR export signal appears to be characterized in vertebrates by a low content of adenines, as demonstrated by genome-wide sequence analysis and by the inhibitory effect of silent adenine mutations in SSCRs. The discovery of an SSCR-mediated pathway explains the previously noted amino acid bias in signal sequences and suggests a link between nuclear export and membrane targeting of mRNAs. PMID:18052610

  13. TranslatorX: multiple alignment of nucleotide sequences guided by amino acid translations.

    PubMed

    Abascal, Federico; Zardoya, Rafael; Telford, Maximilian J

    2010-07-01

    We present TranslatorX, a web server designed to align protein-coding nucleotide sequences based on their corresponding amino acid translations. Many comparisons between biological sequences (nucleic acids and proteins) involve the construction of multiple alignments. Alignments represent a statement regarding the homology between individual nucleotides or amino acids within homologous genes. As protein-coding DNA sequences evolve as triplets of nucleotides (codons) and it is known that sequence similarity degrades more rapidly at the DNA than at the amino acid level, alignments are generally more accurate when based on amino acids than on their corresponding nucleotides. TranslatorX novelties include: (i) use of all documented genetic codes and the possibility of assigning different genetic codes for each sequence; (ii) a battery of different multiple alignment programs; (iii) translation of ambiguous codons when possible; (iv) an innovative criterion to clean nucleotide alignments with GBlocks based on protein information; and (v) a rich output, including Jalview-powered graphical visualization of the alignments, codon-based alignments coloured according to the corresponding amino acids, measures of compositional bias and first, second and third codon position specific alignments. The TranslatorX server is freely available at http://translatorx.co.uk.

  14. The mammalian transcriptome and the function of non-coding DNA sequences

    PubMed Central

    Shabalina, Svetlana A; Spiridonov, Nikolay A

    2004-01-01

    For decades, researchers have focused most of their attention on protein-coding genes and proteins. With the completion of the human and mouse genomes and the accumulation of data on the mammalian transcriptome, the focus now shifts to non-coding DNA sequences, RNA-coding genes and their transcripts. Many non-coding transcribed sequences are proving to have important regulatory roles, but the functions of the majority remain mysterious. PMID:15059247

  15. A minimal sequence code for switching protein structure and function.

    PubMed

    Alexander, Patrick A; He, Yanan; Chen, Yihong; Orban, John; Bryan, Philip N

    2009-12-15

    We present here a structural and mechanistic description of how a protein changes its fold and function, mutation by mutation. Our approach was to create 2 proteins that (i) are stably folded into 2 different folds, (ii) have 2 different functions, and (iii) are very similar in sequence. In this simplified sequence space we explore the mutational path from one fold to another. We show that an IgG-binding, 4beta+alpha fold can be transformed into an albumin-binding, 3-alpha fold via a mutational pathway in which neither function nor native structure is completely lost. The stabilities of all mutants along the pathway are evaluated, key high-resolution structures are determined by NMR, and an explanation of the switching mechanism is provided. We show that the conformational switch from 4beta+alpha to 3-alpha structure can occur via a single amino acid substitution. On one side of the switch point, the 4beta+alpha fold is >90% populated (pH 7.2, 20 degrees C). A single mutation switches the conformation to the 3-alpha fold, which is >90% populated (pH 7.2, 20 degrees C). We further show that a bifunctional protein exists at the switch point with affinity for both IgG and albumin. PMID:19923431

  16. 77 FR 65537 - Requirements for Patent Applications Containing Nucleotide Sequence and/or Amino Acid Sequence...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-10-29

    ... Amino Acid Sequence Disclosures ACTION: Proposed collection; comment request. SUMMARY: The United States....'' SUPPLEMENTARY INFORMATION: I. Abstract Patent applications that contain nucleotide and/or amino acid...

  17. Complete coding sequences of dengue-1 viruses from Paraguay and Argentina.

    PubMed

    Avilés, G; Meissner, J; Mantovani, R; St Jeor, S

    2003-12-01

    We have determined the complete coding sequences of six dengue-1 (DEN-1) viruses isolated from Paraguay and Argentina in 2000 from patients with dengue fever. Sequences of strains 259par00, 280par00, 295arg00, 297arg00 and 301arg00 can encode a polyprotein of 3392 amino acids. Strain 293arg00 circulated as a "wild type+deletion mutant" quasispecies, with a subpopulation characterized by a 3-nucleotide deletion in the NS4A region. This variant, which would encode a three amino acid change in the NS4A protein, was found as a minority population in one additional partially-sequenced isolate from the same outbreak. These six South American strains group into two different clades of the "American-African" DEN-1 genotype-one clade is most closely related to strains isolated from Brazil in 1997, the other to a Peruvian strain isolated in 1991 for which only partial sequence information is available. DEN-1 viruses isolated worldwide comprise at least four different genotypes according to previously defined classification criteria.

  18. In search of coding and non-coding regions of DNA sequences based on balanced estimation of diffusion entropy.

    PubMed

    Zhang, Jin; Zhang, Wenqing; Yang, Huijie

    2016-01-01

    Identification of coding regions in DNA sequences remains challenging. Various methods have been proposed, but these are limited by species-dependence and the need for adequate training sets. The elements in DNA coding regions are known to be distributed in a quasi-random way, while those in non-coding regions have typical similar structures. For short sequences, these statistical characteristics cannot be extracted correctly and cannot even be detected. This paper introduces a new way to solve the problem: balanced estimation of diffusion entropy (BEDE).

  19. Molecular cloning and amino acid sequence of human 5-lipoxygenase

    SciTech Connect

    Matsumoto, T.; Funk, C.D.; Radmark, O.; Hoeoeg, J.O.; Joernvall, H.; Samuelsson, B.

    1988-01-01

    5-Lipoxygenase (EC 1.13.11.34), a Ca/sup 2 +/- and ATP-requiring enzyme, catalyzes the first two steps in the biosynthesis of the peptidoleukotrienes and the chemotactic factor leukotriene B/sub 4/. A cDNA clone corresponding to 5-lipoxygenase was isolated from a human lung lambda gt11 expression library by immunoscreening with a polyclonal antibody. Additional clones from a human placenta lambda gt11 cDNA library were obtained by plaque hybridization with the /sup 32/P-labeled lung cDNA clone. Sequence data obtained from several overlapping clones indicate that the composite DNAs contain the complete coding region for the enzyme. From the deduced primary structure, 5-lipoxygenase encodes a 673 amino acid protein with a calculated molecular weight of 77,839. Direct analysis of the native protein and its proteolytic fragments confirmed the deduced composition, the amino-terminal amino acid sequence, and the structure of many internal segments. 5-Lipoxygenase has no apparent sequence homology with leukotriene A/sub 4/ hydrolase or Ca/sup 2 +/-binding proteins. RNA blot analysis indicated substantial amounts of an mRNA species of approx. = 2700 nucleotides in leukocytes, lung, and placenta.

  20. Comparison of the amino acid sequence of the major immunogen from three serotypes of foot and mouth disease virus.

    PubMed Central

    Makoff, A J; Paynter, C A; Rowlands, D J; Boothroyd, J C

    1982-01-01

    Cloned cDNA molecules from three serotypes of FMDV have been sequenced around the VP1-coding region. The predicted amino acid sequences for VP1 were compared with the published sequences and variable regions identified. The amino acid sequences were also analysed for hydrophilic regions. Two of the variable regions, numbered 129-160 and 193-204 overlapped hydrophilic regions, and were therefore identified as potentially immunogenic. These regions overlap regions shown by others to be immunogenic. PMID:6298715

  1. Cloning and nucleotide sequence of the gene coding for citrate synthase from a thermotolerant Bacillus sp

    SciTech Connect

    Schendel, F.J.; August, P.R.; Anderson, C.R.; Flickinger, M.C. ); Hanson, R.S. )

    1992-01-01

    Acetate salts are emerging as potentially attractive bulk chemicals for a variety of environmental applications, for example, as catalysts to facilitate combustion of high-sulfur coal by electrical utilities and as the biodegradable noncorrosive highway deicing salt calcium magnesium acetate. The structural gene coding for citrate synthase from the gram-positive soil isolate Bacillus sp. strain C4 (ATCC 55182) capable of secreting acetic acid at pH 5.0 to 7.0 in the presence of dolime has been cloned from a genomic library by complementation of an Escherichia coli auxotrophic mutant lacking citrate synthase. The nucleotide sequence of the entire 3.1-kb HindIII fragment has been determined, and one major open reading frame was found coding for citrate synthase (ctsA). Citrate synthase from Bacillus sp. strain C4 was found to be a dimer (M{sub r}, 84,500) with a sub unit with an M{sub r} of 42,000. The N-terminal sequence was found to be identical with that predicted from the gene sequence. The kinetics were best fit to a bisubstrate enzyme with an ordered mechanism. Bacillus sp. strain C4 citrate synthase was not activated by potassium chloride and was not inhibited by NADH, ATP, ADP, or AMP at levels up to 1 mM. The predicted amino acid sequence was compared with that of the E. coli, Acinetobacter anitratum, Pseudomonas aeruginosa, Rickettsia prowazekii, porcine heart, and Saccharomyces cerevisiae cytoplasmic and mitochondrial enzymes.

  2. Genetic Code Expansion of Mammalian Cells with Unnatural Amino Acids.

    PubMed

    Brown, Kalyn A; Deiters, Alexander

    2015-09-01

    The expansion of the genetic code of mammalian cells enables the incorporation of unnatural amino acids into proteins. This is achieved by adding components to the protein biosynthetic machinery, specifically an engineered aminoacyl-tRNA synthetase/tRNA pair. The unnatural amino acids are chemically synthesized and supplemented to the growth medium. Using this methodology, fundamental new chemistries can be added to the functional repertoire of the genetic code of mammalian cells. This protocol outlines the steps necessary to incorporate a photocaged lysine into proteins and showcases its application in the optical triggering of protein translocation to the nucleus.

  3. Detection of nucleic acid sequences by invader-directed cleavage

    DOEpatents

    Brow, Mary Ann D.; Hall, Jeff Steven Grotelueschen; Lyamichev, Victor; Olive, David Michael; Prudent, James Robert

    1999-01-01

    The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The 5' nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof. The present invention further relates to methods and devices for the separation of nucleic acid molecules based by charge.

  4. Coding in 2D: Using Intentional Dispersity to Enhance the Information Capacity of Sequence-Coded Polymer Barcodes.

    PubMed

    Laure, Chloé; Karamessini, Denise; Milenkovic, Olgica; Charles, Laurence; Lutz, Jean-François

    2016-08-26

    A 2D approach was studied for the design of polymer-based molecular barcodes. Uniform oligo(alkoxyamine amide)s, containing a monomer-coded binary message, were synthesized by orthogonal solid-phase chemistry. Sets of oligomers with different chain-lengths were prepared. The physical mixture of these uniform oligomers leads to an intentional dispersity (1st dimension fingerprint), which is measured by electrospray mass spectrometry. Furthermore, the monomer sequence of each component of the mass distribution can be analyzed by tandem mass spectrometry (2nd dimension sequencing). By summing the sequence information of all components, a binary message can be read. A 4-bytes extended ASCII-coded message was written on a set of six uniform oligomers. Alternatively, a 3-bytes sequence was written on a set of five oligomers. In both cases, the coded binary information was recovered. PMID:27484303

  5. Genetic code correlations - Amino acids and their anticodon nucleotides

    NASA Technical Reports Server (NTRS)

    Weber, A. L.; Lacey, J. C., Jr.

    1978-01-01

    The data here show direct correlations between both the hydrophobicity and the hydrophilicity of the homocodonic amino acids and their anticodon nucleotides. While the differences between properties of uracil and cytosine derivatives are small, further data show that uracil has an affinity for charged species. Although these data suggest that molecular relationships between amino acids and anticodons were responsible for the origin of the code, it is not clear what the mechanism of the origin might have been.

  6. Functional annotation of non-coding sequence variants

    PubMed Central

    Ritchie, Graham R. S.; Dunham, Ian; Zeggini, Eleftheria; Flicek, Paul

    2016-01-01

    Identifying functionally relevant variants against the background of ubiquitous genetic variation is a major challenge in human genetics. For variants that fall in protein-coding regions our understanding of the genetic code and splicing allow us to identify likely candidates, but interpreting variants that fall outside of genic regions is more difficult. Here we present a new tool, GWAVA, which supports prioritisation of non-coding variants by integrating a range of annotations. PMID:24487584

  7. Non-extensive trends in the size distribution of coding and non-coding DNA sequences in the human genome

    NASA Astrophysics Data System (ADS)

    Oikonomou, Th.; Provata, A.

    2006-03-01

    We study the primary DNA structure of four of the most completely sequenced human chromosomes (including chromosome 19 which is the most dense in coding), using non-extensive statistics. We show that the exponents governing the spatial decay of the coding size distributions vary between 5.2 ≤r ≤5.7 for the short scales and 1.45 ≤q ≤1.50 for the large scales. On the contrary, the exponents governing the spatial decay of the non-coding size distributions in these four chromosomes, take the values 2.4 ≤r ≤3.2 for the short scales and 1.50 ≤q ≤1.72 for the large scales. These results, in particular the values of the tail exponent q, indicate the existence of correlations in the coding and non-coding size distributions with tendency for higher correlations in the non-coding DNA.

  8. A convolutional code-based sequence analysis model and its application.

    PubMed

    Liu, Xiao; Geng, Xiaoli

    2013-04-16

    A new approach for encoding DNA sequences as input for DNA sequence analysis is proposed using the error correction coding theory of communication engineering. The encoder was designed as a convolutional code model whose generator matrix is designed based on the degeneracy of codons, with a codon treated in the model as an informational unit. The utility of the proposed model was demonstrated through the analysis of twelve prokaryote and nine eukaryote DNA sequences having different GC contents. Distinct differences in code distances were observed near the initiation and termination sites in the open reading frame, which provided a well-regulated characterization of the DNA sequences. Clearly distinguished period-3 features appeared in the coding regions, and the characteristic average code distances of the analyzed sequences were approximately proportional to their GC contents, particularly in the selected prokaryotic organisms, presenting the potential utility as an added taxonomic characteristic for use in studying the relationships of living organisms.

  9. Hybridization and sequencing of nucleic acids using base pair mismatches

    DOEpatents

    Fodor, Stephen P. A.; Lipshutz, Robert J.; Huang, Xiaohua

    2001-01-01

    Devices and techniques for hybridization of nucleic acids and for determining the sequence of nucleic acids. Arrays of nucleic acids are formed by techniques, preferably high resolution, light-directed techniques. Positions of hybridization of a target nucleic acid are determined by, e.g., epifluorescence microscopy. Devices and techniques are proposed to determine the sequence of a target nucleic acid more efficiently and more quickly through such synthesis and detection techniques.

  10. Codes in the codons: construction of a codon/amino acid periodic table and a study of the nature of specific nucleic acid-protein interactions.

    PubMed

    Benyo, B; Biro, J C; Benyo, Z

    2004-01-01

    The theory of "codon-amino acid coevolution" was first proposed by Woese in 1967. It suggests that there is a stereochemical matching - that is, affinity - between amino acids and certain of the base triplet sequences that code for those amino acids. We have constructed a common periodic table of codons and amino acids, where the nucleic acid table showed perfect axial symmetry for codons and the corresponding amino acid table also displayed periodicity regarding the biochemical properties (charge and hydrophobicity) of the 20 amino acids and the position of the stop signals. The table indicates that the middle (2/sup nd/) amino acid in the codon has a prominent role in determining some of the structural features of the amino acids. The possibility that physical contact between codons and amino acids might exist was tested on restriction enzymes. Many recognition site-like sequences were found in the coding sequences of these enzymes and as many as 73 examples of codon-amino acid co-location were observed in the 7 known 3D structures (December 2003) of endonuclease-nucleic acid complexes. These results indicate that the smallest possible units of specific nucleic acid-protein interaction are indeed the stereochemically compatible codons and amino acids.

  11. Cloning and nucleotide sequence of the genes coding for the Sau96I restriction and modification enzymes.

    PubMed Central

    Szilák, L; Venetianer, P; Kiss, A

    1990-01-01

    The genes coding for the GGNCC specific Sau96I restriction and modification enzymes were cloned and expressed in E. coli. The DNA sequence predicts a 430 amino acid protein (Mr: 49,252) for the methyltransferase and a 261 amino acid protein (Mr: 30,486) for the endonuclease. No protein sequence similarity was detected between the Sau96I methyltransferase and endonuclease. The methyltransferase contains the sequence elements characteristic for m5C-methyltransferases. In addition to this, M.Sau96I shows similarity, also in the variable region, with one m5C-methyltransferase (M.SinI) which has closely related recognition specificity (GGA/TCC). M.Sau96I methylates the internal cytosine within the GGNCC recognition sequence. The Sau96I endonuclease appears to act as a monomer. Images PMID:2204026

  12. Associations of single nucleotide polymorphisms in the Pygo2 coding sequence with idiopathic oligospermia and azoospermia.

    PubMed

    Ge, S-Q; Grifin, J; Liu, L-H; Aston, K I; Simon, L; Jenkins, T G; Emery, B R; Carrell, D T

    2015-08-07

    Male infertility is often associated with a decreased sperm count. The Pygo2 gene is expressed in the elongating spermatid during chromatin remodeling; thus impairment in PYGO2 function might lead to spermatogenic arrest, sperm count reduction, and subsequent infertility. The aim of this study was to identify mutations in Pygo2 that might lead to idiopathic oligospermia and azoospermia. DNA was isolated from venous blood from 77 men with normal fertility and 195 men with idiopathic oligospermia or azoospermia. Polymerase chain reaction-sequencing analysis was performed for the three Pygo2 coding regions. Non-synonymous single nucleotide polymorphisms (SNPs) were detected and analyzed using SIFT, Polyphen-2, and Mutation Taster softwares to identify possible changes in protein structure that could affect phenotype. Pygo2 sequencing was successful for 178 patients (30 with mild or moderate oligospermia, 57 with severe oligospermia, and 91 with azoospermia). Three previously reported non-synonymous SNPs were identified in patients with azoospermia or severe oligospermic but not in those with mild or moderate oligozoopermia or normozoospermia. SNPs rs61758740 (M141I) and rs141722381 (N240I) cause the replacement of one hydrophobic or hydrophilic amino acid, respectively, with another, and SNP rs61758741 (K261E) causes the replacement of a basic amino acid with an acidic one. The software predictions demonstrated that SNP rsl41722381 would likely result in disrupted tertiary protein structure and thus could be involved in disease pathogenesis. Overall, this study demonstrated that SNPs in the coding region of Pygo2 might be one of the causative factors in idiopathic oligospermia and azoospermia, resulting in male infertility.

  13. The origin of the biologically coded amino acids.

    PubMed

    Cleaves, H James

    2010-04-21

    Biology uses essentially 20 amino acids for its coded protein enzymes, representing a very small subset of the structurally possible set. Most models of the origin of life suggest organisms developed from environmentally available organic compounds. A variety of amino acids are easily produced under conditions which were believed to have existed on the primitive Earth or in the early solar nebula. The types of amino acids produced depend on the conditions which prevailed at the time of synthesis, which remain controversial. The selection of the biological set is likely due to chemical and early biological evolution acting on the environmentally available compounds based on their chemical properties. Once life arose, selection would have proceeded based on the functional utility of amino acids coupled with their accessibility by primitive metabolism and their compatibility with other biochemical processes. Some possible mechanisms by which the modern set of 20 amino acids was selected starting from prebiotic chemistry are discussed. PMID:20034500

  14. The Coding of Biological Information: From Nucleotide Sequence to Protein Recognition

    NASA Astrophysics Data System (ADS)

    Štambuk, Nikola

    The paper reviews the classic results of Swanson, Dayhoff, Grantham, Blalock and Root-Bernstein, which link genetic code nucleotide patterns to the protein structure, evolution and molecular recognition. Symbolic representation of the binary addresses defining particular nucleotide and amino acid properties is discussed, with consideration of: structure and metric of the code, direct correspondence between amino acid and nucleotide information, and molecular recognition of the interacting protein motifs coded by the complementary DNA and RNA strands.

  15. Conversion of amino-acid sequence in proteins to classical music: search for auditory patterns

    PubMed Central

    2007-01-01

    We have converted genome-encoded protein sequences into musical notes to reveal auditory patterns without compromising musicality. We derived a reduced range of 13 base notes by pairing similar amino acids and distinguishing them using variations of three-note chords and codon distribution to dictate rhythm. The conversion will help make genomic coding sequences more approachable for the general public, young children, and vision-impaired scientists. PMID:17477882

  16. Analysis of similarity/dissimilarity of DNA sequences based on convolutional code model.

    PubMed

    Liu, Xiao; Tian, Feng Chun; Wang, Shi Yuan

    2010-02-01

    Based on the convolutional code model of error-correction coding theory, we propose an approach to characterize and compare DNA sequences with consideration of the effect of codon context. We construct an 8-component vector whose components are the normalized leading eigenvalues of the L/L and M/M matrices associated with the original DNA sequences and the transformed sequences. The utility of our approach is illustrated by the examination of the similarities/dissimilarities among the coding sequences of the first exon of beta-globin gene of 11 species, and the efficiency of error-correction coding theory in analysis of similarity/dissimilarity of DNA sequences is represented.

  17. Coded excitation using periodic and unipolar M-sequences for photoacoustic imaging and flow measurement.

    PubMed

    Zhang, Haichong K; Kondo, Kengo; Yamakawa, Makoto; Shiina, Tsuyoshi

    2016-01-11

    Photoacoustic imaging is an emerging imaging technology combining optical imaging with ultrasound. Imaging of the optical absorption coefficient and flow measurement provides additional functional information compared to ultrasound. The issue with photoacoustic imaging is its low signal-to-noise ratio (SNR) due to scattering or attenuation; this is especially problematic when high pulse repetition frequency (PRF) lasers are used. In previous research, coded excitation utilizing several pseudorandom sequences has been considered as a solution for the problem. However, previously proposed temporal coding procedures using Golay codes or M-sequences are so complex that it was necessary to send a sequence twice to realize a bipolar sequence. Here, we propose a periodic and unipolar sequence (PUM), which is a periodic sequence derived from an m-sequence. The PUM can enhance signals without causing coding artifacts for single wavelength excitation. In addition, it is possible to increase the temporal resolution since the decoding start point can be set to any code in periodic irradiation, while only the first code of a sequence was available for conventional aperiodic irradiation. The SNR improvement and the increase in temporal resolution were experimentally validated through imaging evaluation and flow measurement. PMID:26832234

  18. Predicting intrinsic disorder from amino acid sequence.

    PubMed

    Obradovic, Zoran; Peng, Kang; Vucetic, Slobodan; Radivojac, Predrag; Brown, Celeste J; Dunker, A Keith

    2003-01-01

    Blind predictions of intrinsic order and disorder were made on 42 proteins subsequently revealed to contain 9,044 ordered residues, 284 disordered residues in 26 segments of length 30 residues or less, and 281 disordered residues in 2 disordered segments of length greater than 30 residues. The accuracies of the six predictors used in this experiment ranged from 77% to 91% for the ordered regions and from 56% to 78% for the disordered segments. The average of the order and disorder predictions ranged from 73% to 77%. The prediction of disorder in the shorter segments was poor, from 25% to 66% correct, while the prediction of disorder in the longer segments was better, from 75% to 95% correct. Four of the predictors were composed of ensembles of neural networks. This enabled them to deal more efficiently with the large asymmetry in the training data through diversified sampling from the significantly larger ordered set and achieve better accuracy on ordered and long disordered regions. The exclusive use of long disordered regions for predictor training likely contributed to the disparity of the predictions on long versus short disordered regions, while averaging the output values over 61-residue windows to eliminate short predictions of order or disorder probably contributed to the even greater disparity for three of the predictors. This experiment supports the predictability of intrinsic disorder from amino acid sequence. PMID:14579347

  19. Sequence of the 3'-noncoding and adjacent coding regions of human gamma-globin mRNA.

    PubMed Central

    Poon, R; Kan, Y W; Boyer, H W

    1978-01-01

    In cloning human fetal globin cDNA in bacterial plasmids, we obtained a recombinant which contained a fragment of gammg-globin cDNA corresponding to the region from amino acid 99 to the poly A. We determined a sequence of 169 nucleotides which included the complete 3' non-coding region of the gamma-globin mRNA. The codon for amino acid 136 was GCA, indicating that this cloned fragment was derived from the Agamma-globin gene. In conjunction with the surrounding sequences, the GCA codon provides the Agamma-species with a unique CTGCAG hexanucleotide that is recognized by the restriction enzyme Pst I. The 3'-untranslated region of the gamma-globin mRNA consists of 90 nucleotides, and shares little homology with that of the human beta-globin mRNA. As in other mammalian mRNAs, a symmetrical sequence and the hexanucleotide AAUAAA are present. Images PMID:318163

  20. Peculiar symmetry of DNA sequences and evidence suggesting its evolutionary origin in a primeval genetic code

    NASA Astrophysics Data System (ADS)

    Jolivet, R.; Rothen, F.

    2001-08-01

    Statistical analysis of the distribution of codons in DNA coding sequences of bacteria or archaea suggests that, at some stage of the prebiotic world, the most successful RNA replicating sequences afforded some tendency toward a weak form of palindromic symmetry, namely complementary symmetry. As a consequence, as soon as the machinery allowing translation into proteins was beginning to settle, we assume that primeval versions of the genetic code essentially consisted of pairs of sense-antisense codons. Present-day DNA sequences display footprints of this early symmetry, provided that statistics are made over coding sequences issued from groups of organisms and not only from the genome of an individual species. These fossil traces are proven to be significant from the statistical point of view. They shed some light onto the possible evolution of the genetic code and set some constraints on the way it had to follow.

  1. Methods and compositions for efficient nucleic acid sequencing

    DOEpatents

    Drmanac, Radoje

    2002-01-01

    Disclosed are novel methods and compositions for rapid and highly efficient nucleic acid sequencing based upon hybridization with two sets of small oligonucleotide probes of known sequences. Extremely large nucleic acid molecules, including chromosomes and non-amplified RNA, may be sequenced without prior cloning or subcloning steps. The methods of the invention also solve various current problems associated with sequencing technology such as, for example, high noise to signal ratios and difficult discrimination, attaching many nucleic acid fragments to a surface, preparing many, longer or more complex probes and labelling more species.

  2. Methods and compositions for efficient nucleic acid sequencing

    DOEpatents

    Drmanac, Radoje

    2006-07-04

    Disclosed are novel methods and compositions for rapid and highly efficient nucleic acid sequencing based upon hybridization with two sets of small oligonucleotide probes of known sequences. Extremely large nucleic acid molecules, including chromosomes and non-amplified RNA, may be sequenced without prior cloning or subcloning steps. The methods of the invention also solve various current problems associated with sequencing technology such as, for example, high noise to signal ratios and difficult discrimination, attaching many nucleic acid fragments to a surface, preparing many, longer or more complex probes and labelling more species.

  3. Stochastic model of homogeneous coding and latent periodicity in DNA sequences.

    PubMed

    Chaley, Maria; Kutyrkin, Vladimir

    2016-02-01

    The concept of latent triplet periodicity in coding DNA sequences which has been earlier extensively discussed is confirmed in the result of analysis of a number of eukaryotic genomes, where latent periodicity of a new type, called profile periodicity, is recognized in the CDSs. Original model of Stochastic Homogeneous Organization of Coding (SHOC-model) in textual string is proposed. This model explains the existence of latent profile periodicity and regularity in DNA sequences. PMID:26656186

  4. Indoor Mobile Positioning Based on Lidar Data and Coded Sequence Pattern

    NASA Astrophysics Data System (ADS)

    Wang, Z.; Dong, B.; Chen, D.

    2016-10-01

    This paper proposed a coded sequence pattern for automatic matching of LiDAR point data, the methods including SIFT features, Otsu segmentation and Fast Hough transformation for the identification, positioning and interpret of the coded sequence patterns, the POSIT model for fast computing the translation and rotation parameters of LiDAR point data, so as to achieve fast matching of LiDAR point data and automatic 3D mapping of indoor shafts and tunnels.

  5. Kit for detecting nucleic acid sequences using competitive hybridization probes

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    2001-01-01

    A kit is provided for detecting a target nucleic acid sequence in a sample, the kit comprising: a first hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a first portion of the target sequence, the first hybridization probe including a first complexing agent for forming a binding pair with a second complexing agent; and a second hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a second portion of the target sequence to which the first hybridization probe does not selectively hybridize, the second hybridization probe including a detectable marker; a third hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a first portion of the target sequence, the third hybridization probe including the same detectable marker as the second hybridization probe; and a fourth hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a second portion of the target sequence to which the third hybridization probe does not selectively hybridize, the fourth hybridization probe including the first complexing agent for forming a binding pair with the second complexing agent; wherein the first and second hybridization probes are capable of simultaneously hybridizing to the target sequence and the third and fourth hybridization probes are capable of simultaneously hybridizing to the target sequence, the detectable marker is not present on the first or fourth hybridization probes and the first, second, third, and fourth hybridization probes each include a competitive nucleic acid sequence which is sufficiently complementary to a third portion of the target sequence that the competitive sequences of the first, second, third, and fourth hybridization probes compete with each other to hybridize to the third portion of the

  6. The Coding and Effector Transfer of Movement Sequences

    ERIC Educational Resources Information Center

    Kovacs, Attila J.; Muhlbauer, Thomas; Shea, Charles H.

    2009-01-01

    Three experiments utilizing a 14-element arm movement sequence were designed to determine if reinstating the visual-spatial coordinates, which require movements to the same spatial locations utilized during acquisition, results in better effector transfer than reinstating the motor coordinates, which require the same pattern of homologous muscle…

  7. Correcting sequencing errors in DNA coding regions using a dynamic programming approach.

    PubMed

    Xu, Y; Mural, R J; Uberbacher, E C

    1995-04-01

    This paper presents an algorithm for detecting and 'correcting' sequencing errors that occur in DNA coding regions. The types of sequencing errors addressed are insertions and deletions (indels) of DNA bases. The goal is to provide a capability which makes single-pass or low-redundancy sequence data more informative, reducing the need for high-redundancy sequencing for gene identification and characterization purposes. This would permit improved sequencing efficiency and reduce genome sequencing costs. The algorithm detects sequencing errors by discovering changes in the statistically preferred reading frame within a putative coding region and then inserts a number of 'neutral' bases at a perceived reading frame transition point to make the putative exon candidate frame consistent. We have implemented the algorithm as a front-end subsystem of the GRAIL DNA sequence analysis system to construct a version which is very error tolerant and also intend to use this as a testbed for further development of sequencing error-correction technology. Preliminary test results have shown the usefulness of this algorithm and also exhibited some of its weakness, providing possible directions for further improvement. On a test set consisting of 68 human DNA sequences with 1% randomly generated indels in coding regions, the algorithm detected and corrected 76% of the indels. The average distance between the position of an indel and the predicted one was 9.4 bases. With this subsystem in place, GRAIL correctly predicted 89% of the coding messages with 10% false message on the 'corrected' sequences, compared to 69% correctly predicted coding messages and 11% falsely predicted messages on the 'corrupted' sequences using standard GRAIL II method (version 1.2).(ABSTRACT TRUNCATED AT 250 WORDS)

  8. Relation between mRNA expression and sequence information in Desulfovibrio vulgaris: Combinatorial contributions of upstream regulatory motifs and coding sequence features to variations in mRNA abundance

    SciTech Connect

    Wu, Gang; Nie, Lei; Zhang, Weiwen

    2006-05-26

    ABSTRACT-The context-dependent expression of genes is the core for biological activities, and significant attention has been given to identification of various factors contributing to gene expression at genomic scale. However, so far this type of analysis has been focused whether on relation between mRNA expression and non-coding sequence features such as upstream regulatory motifs or on correlation between mRN abundance and non-random features in coding sequences (e.g. codon usage and amino acid usage). In this study multiple regression analyses of the mRNA abundance and all sequence information in Desulfovibrio vulgaris were performed, with the goal to investigate how much coding and non-coding sequence features contribute to the variations in mRNA expression, and in what manner they act together...

  9. Analysis and Annotation of Nucleic Acid Sequence

    SciTech Connect

    States, David J.

    2004-07-28

    The aims of this project were to develop improved methods for computational genome annotation and to apply these methods to improve the annotation of genomic sequence data with a specific focus on human genome sequencing. The project resulted in a substantial body of published work. Notable contributions of this project were the identification of basecalling and lane tracking as error processes in genome sequencing and contributions to improved methods for these steps in genome sequencing. This technology improved the accuracy and throughput of genome sequence analysis. Probabilistic methods for physical map construction were developed. Improved methods for sequence alignment, alternative splicing analysis, promoter identification and NF kappa B response gene prediction were also developed.

  10. Solid phase sequencing of double-stranded nucleic acids

    DOEpatents

    Fu, Dong-Jing; Cantor, Charles R.; Koster, Hubert; Smith, Cassandra L.

    2002-01-01

    This invention relates to methods for detecting and sequencing of target double-stranded nucleic acid sequences, to nucleic acid probes and arrays of probes useful in these methods, and to kits and systems which contain these probes. Useful methods involve hybridizing the nucleic acids or nucleic acids which represent complementary or homologous sequences of the target to an array of nucleic acid probes. These probe comprise a single-stranded portion, an optional double-stranded portion and a variable sequence within the single-stranded portion. The molecular weights of the hybridized nucleic acids of the set can be determined by mass spectroscopy, and the sequence of the target determined from the molecular weights of the fragments. Nucleic acids whose sequences can be determined include nucleic acids in biological samples such as patient biopsies and environmental samples. Probes may be fixed to a solid support such as a hybridization chip to facilitate automated determination of molecular weights and identification of the target sequence.

  11. Do Intron and Coding Sequences of Some Human-Mouse Orthologs Evolve as a Single Unit?

    PubMed

    Fuertes, Miguel Angel; Rodrigo, José Ramón; Alonso, Carlos

    2016-06-01

    It has been previously suggested that both the coding and the associated non-coding sequences of some human-mouse orthologs could evolve as a single unit. This letter deals with the observation that between mouse and humans some orthologs change significantly their compositional features as an indication that the molecular evolution is a local process. Moreover, the data shown indicate that the coding and the intron sequences of these orthologs do not evolve independently but instead both undergo a concerted evolution, evolving as a single unit, from a compositional cluster in mouse to a different compositional cluster in human. PMID:27220874

  12. Correcting sequencing errors in DNA coding regions using a dynamic programming approach

    SciTech Connect

    Xu, Y.; Mural, R.J.; Uberbacher, E.C.

    1994-12-01

    This paper presents an algorithm for detecting and ``correcting`` sequencing errors that occur in DNA coding regions. The types of sequencing error addressed include insertions and deletions (indels) of DNA bases. The goal is to provide a capability which makes single-pass or low-redundancy sequence data more informative, reducing the need for high-redundancy sequencing for gene identification and characterization purposes. The algorithm detects sequencing errors by discovering changes in the statistically preferred reading frame within a putative coding region and then inserts a number of ``neutral`` bases at a perceived reading frame transition point to make the putative exon candidate frame consistent. The authors have implemented the algorithm as a front-end subsystem of the GRAIL DNA sequence analysis system to construct a version which is very error tolerant and also intend to use this as a testbed for further development of sequencing error-correction technology. On a test set consisting of 68 Human DNA sequences with 1% randomly generated indels in coding regions, the algorithm detected and corrected 76% of the indels. The average distance between the position of an indel and the predicted one was 9.4 bases. With this subsystem in place, GRAIL correctly predicted 89% of the coding messages with 10% false message on the ``corrected`` sequences, compared to 69% correctly predicted coding messages and 11% falsely predicted messages on the ``corrupted`` sequences using standard GRAIL II method. The method uses a dynamic programming algorithm, and runs in time and space linear to the size of the input sequence.

  13. Draft genome sequence of the docosahexaenoic acid producing thraustochytrid Aurantiochytrium sp. T66.

    PubMed

    Liu, Bin; Ertesvåg, Helga; Aasen, Inga Marie; Vadstein, Olav; Brautaset, Trygve; Heggeset, Tonje Marita Bjerkan

    2016-06-01

    Thraustochytrids are unicellular, marine protists, and there is a growing industrial interest in these organisms, particularly because some species, including strains belonging to the genus Aurantiochytrium, accumulate high levels of docosahexaenoic acid (DHA). Here, we report the draft genome sequence of Aurantiochytrium sp. T66 (ATCC PRA-276), with a size of 43 Mbp, and 11,683 predicted protein-coding sequences. The data has been deposited at DDBJ/EMBL/Genbank under the accession LNGJ00000000. The genome sequence will contribute new insight into DHA biosynthesis and regulation, providing a basis for metabolic engineering of thraustochytrids. PMID:27222814

  14. Draft genome sequence of the docosahexaenoic acid producing thraustochytrid Aurantiochytrium sp. T66.

    PubMed

    Liu, Bin; Ertesvåg, Helga; Aasen, Inga Marie; Vadstein, Olav; Brautaset, Trygve; Heggeset, Tonje Marita Bjerkan

    2016-06-01

    Thraustochytrids are unicellular, marine protists, and there is a growing industrial interest in these organisms, particularly because some species, including strains belonging to the genus Aurantiochytrium, accumulate high levels of docosahexaenoic acid (DHA). Here, we report the draft genome sequence of Aurantiochytrium sp. T66 (ATCC PRA-276), with a size of 43 Mbp, and 11,683 predicted protein-coding sequences. The data has been deposited at DDBJ/EMBL/Genbank under the accession LNGJ00000000. The genome sequence will contribute new insight into DHA biosynthesis and regulation, providing a basis for metabolic engineering of thraustochytrids.

  15. Successful Recovery of Nuclear Protein-Coding Genes from Small Insects in Museums Using Illumina Sequencing.

    PubMed

    Kanda, Kojun; Pflug, James M; Sproul, John S; Dasenko, Mark A; Maddison, David R

    2015-01-01

    In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles

  16. Dynamite: a flexible code generating language for dynamic programming methods used in sequence comparison.

    PubMed

    Birney, E; Durbin, R

    1997-01-01

    We have developed a code generating language, called Dynamite, specialised for the production and subsequent manipulation of complex dynamic programming methods for biological sequence comparison. From a relatively simple text definition file Dynamite will produce a variety of implementations of a dynamic programming method, including database searches and linear space alignments. The speed of the generated code is comparable to hand written code, and the additional flexibility has proved invaluable in designing and testing new algorithms. An innovation is a flexible labelling system, which can be used to annotate the original sequences with biological information. We illustrate the Dynamite syntax and flexibility by showing definitions for dynamic programming routines (i) to align two protein sequences under the assumption that they are both poly-topic transmembrane proteins, with the simultaneous assignment of transmembrane helices and (ii) to align protein information to genomic DNA, allowing for introns and sequencing error.

  17. Cloning and sequence analysis of a cDNA clone coding for the mouse GM2 activator protein.

    PubMed Central

    Bellachioma, G; Stirling, J L; Orlacchio, A; Beccari, T

    1993-01-01

    A cDNA (1.1 kb) containing the complete coding sequence for the mouse GM2 activator protein was isolated from a mouse macrophage library using a cDNA for the human protein as a probe. There was a single ATG located 12 bp from the 5' end of the cDNA clone followed by an open reading frame of 579 bp. Northern blot analysis of mouse macrophage RNA showed that there was a single band with a mobility corresponding to a size of 2.3 kb. We deduce from this that the mouse mRNA, in common with the mRNA for the human GM2 activator protein, has a long 3' untranslated sequence of approx. 1.7 kb. Alignment of the mouse and human deduced amino acid sequences showed 68% identity overall and 75% identity for the sequence on the C-terminal side of the first 31 residues, which in the human GM2 activator protein contains the signal peptide. Hydropathicity plots showed great similarity between the mouse and human sequences even in regions of low sequence similarity. There is a single N-glycosylation site in the mouse GM2 activator protein sequence (Asn151-Phe-Thr) which differs in its location from the single site reported in the human GM2 activator protein sequence (Asn63-Val-Thr). Images Figure 1 PMID:7689829

  18. Comparison of Exome and Genome Sequencing Technologies for the Complete Capture of Protein‐Coding Regions

    PubMed Central

    Lelieveld, Stefan H.; Spielmann, Malte; Mundlos, Stefan; Veltman, Joris A.

    2015-01-01

    ABSTRACT For next‐generation sequencing technologies, sufficient base‐pair coverage is the foremost requirement for the reliable detection of genomic variants. We investigated whether whole‐genome sequencing (WGS) platforms offer improved coverage of coding regions compared with whole‐exome sequencing (WES) platforms, and compared single‐base coverage for a large set of exome and genome samples. We find that WES platforms have improved considerably in the last years, but at comparable sequencing depth, WGS outperforms WES in terms of covered coding regions. At higher sequencing depth (95x–160x), WES successfully captures 95% of the coding regions with a minimal coverage of 20x, compared with 98% for WGS at 87‐fold coverage. Three different assessments of sequence coverage bias showed consistent biases for WES but not for WGS. We found no clear differences for the technologies concerning their ability to achieve complete coverage of 2,759 clinically relevant genes. We show that WES performs comparable to WGS in terms of covered bases if sequenced at two to three times higher coverage. This does, however, go at the cost of substantially more sequencing biases in WES approaches. Our findings will guide laboratories to make an informed decision on which sequencing platform and coverage to choose. PMID:25973577

  19. Evaluation of correlation property of linear-frequency-modulated signals coded by maximum-length sequences

    NASA Astrophysics Data System (ADS)

    Yamanaka, Kota; Hirata, Shinnosuke; Hachiya, Hiroyuki

    2016-07-01

    Ultrasonic distance measurement for obstacles has been recently applied in automobiles. The pulse–echo method based on the transmission of an ultrasonic pulse and time-of-flight (TOF) determination of the reflected echo is one of the typical methods of ultrasonic distance measurement. Improvement of the signal-to-noise ratio (SNR) of the echo and the avoidance of crosstalk between ultrasonic sensors in the pulse–echo method are required in automotive measurement. The SNR of the reflected echo and the resolution of the TOF are improved by the employment of pulse compression using a maximum-length sequence (M-sequence), which is one of the binary pseudorandom sequences generated from a linear feedback shift register (LFSR). Crosstalk is avoided by using transmitted signals coded by different M-sequences generated from different LFSRs. In the case of lower-order M-sequences, however, the number of measurement channels corresponding to the pattern of the LFSR is not enough. In this paper, pulse compression using linear-frequency-modulated (LFM) signals coded by M-sequences has been proposed. The coding of LFM signals by the same M-sequence can produce different transmitted signals and increase the number of measurement channels. In the proposed method, however, the truncation noise in autocorrelation functions and the interference noise in cross-correlation functions degrade the SNRs of received echoes. Therefore, autocorrelation properties and cross-correlation properties in all patterns of combinations of coded LFM signals are evaluated.

  20. Evaluation of correlation property of linear-frequency-modulated signals coded by maximum-length sequences

    NASA Astrophysics Data System (ADS)

    Yamanaka, Kota; Hirata, Shinnosuke; Hachiya, Hiroyuki

    2016-07-01

    Ultrasonic distance measurement for obstacles has been recently applied in automobiles. The pulse-echo method based on the transmission of an ultrasonic pulse and time-of-flight (TOF) determination of the reflected echo is one of the typical methods of ultrasonic distance measurement. Improvement of the signal-to-noise ratio (SNR) of the echo and the avoidance of crosstalk between ultrasonic sensors in the pulse-echo method are required in automotive measurement. The SNR of the reflected echo and the resolution of the TOF are improved by the employment of pulse compression using a maximum-length sequence (M-sequence), which is one of the binary pseudorandom sequences generated from a linear feedback shift register (LFSR). Crosstalk is avoided by using transmitted signals coded by different M-sequences generated from different LFSRs. In the case of lower-order M-sequences, however, the number of measurement channels corresponding to the pattern of the LFSR is not enough. In this paper, pulse compression using linear-frequency-modulated (LFM) signals coded by M-sequences has been proposed. The coding of LFM signals by the same M-sequence can produce different transmitted signals and increase the number of measurement channels. In the proposed method, however, the truncation noise in autocorrelation functions and the interference noise in cross-correlation functions degrade the SNRs of received echoes. Therefore, autocorrelation properties and cross-correlation properties in all patterns of combinations of coded LFM signals are evaluated.

  1. From Artificial Amino Acids to Sequence-Defined Targeted Oligoaminoamides.

    PubMed

    Morys, Stephan; Wagner, Ernst; Lächelt, Ulrich

    2016-01-01

    Artificial oligoamino acids with appropriate protecting groups can be used for the sequential assembly of oligoaminoamides on solid-phase. With the help of these oligoamino acids multifunctional nucleic acid (NA) carriers can be designed and produced in highly defined topologies. Here we describe the synthesis of the artificial oligoamino acid Fmoc-Stp(Boc3)-OH, the subsequent assembly into sequence-defined oligomers and the formulation of tumor-targeted plasmid DNA (pDNA) polyplexes. PMID:27436323

  2. Conservation of Shannon's redundancy for proteins. [information theory applied to amino acid sequences

    NASA Technical Reports Server (NTRS)

    Gatlin, L. L.

    1974-01-01

    Concepts of information theory are applied to examine various proteins in terms of their redundancy in natural originators such as animals and plants. The Monte Carlo method is used to derive information parameters for random protein sequences. Real protein sequence parameters are compared with the standard parameters of protein sequences having a specific length. The tendency of a chain to contain some amino acids more frequently than others and the tendency of a chain to contain certain amino acid pairs more frequently than other pairs are used as randomness measures of individual protein sequences. Non-periodic proteins are generally found to have random Shannon redundancies except in cases of constraints due to short chain length and genetic codes. Redundant characteristics of highly periodic proteins are discussed. A degree of periodicity parameter is derived.

  3. Orpinomyces cellulase CelE protein and coding sequences

    DOEpatents

    Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong

    2000-08-29

    A CDNA designated celE cloned from Orpinomyces PC-2 encodes a polypeptide (CelE) of 477 amino acids. CelE is highly homologous to CelB of Orpinomyces (72.3% identity) and Neocallimastix (67.9% identity), and like them, it has a non-catalytic repeated peptide domain (NCRPD) at the C-terminal end. The catalytic domain of CelE is homologous to glycosyl hydrolases of Family 5, found in several anaerobic bacteria. The gene of celE is devoid of introns. The recombinant proteins CelE and CelB of Orpinomyces PC-2 randomly hydrolyze carboxymethylcellulose and cello-oligosaccharides in the pattern of endoglucanases.

  4. Complete cDNA and derived amino acid sequence of human factor V.

    PubMed Central

    Jenny, R J; Pittman, D D; Toole, J J; Kriz, R W; Aldape, R A; Hewick, R M; Kaufman, R J; Mann, K G

    1987-01-01

    cDNA clones encoding human factor V have been isolated from an oligo(dT)-primed human fetal liver cDNA library prepared with vector Charon 21A. The cDNA sequence of factor V from three overlapping clones includes a 6672-base-pair (bp) coding region, a 90-bp 5' untranslated region, and a 163-bp 3' untranslated region within which is a poly(A) tail. The deduced amino acid sequence consists of 2224 amino acids inclusive of a 28-amino acid leader peptide. Direct comparison with human factor VIII reveals considerable homology between proteins in amino acid sequence and domain structure: a triplicated A domain and duplicated C domain show approximately equal to 40% identity with the corresponding domains in factor VIII. As in factor VIII, the A domains of factor V share approximately 40% amino acid-sequence homology with the three highly conserved domains in ceruloplasmin. The B domain of factor V contains 35 tandem and approximately 9 additional semiconserved repeats of nine amino acids of the form Asp-Leu-Ser-Gln-Thr-Thr/Asn-Leu-Ser-Pro and 2 additional semiconserved repeats of 17 amino acids. Factor V contains 37 potential N-linked glycosylation sites, 25 of which are in the B domain, and a total of 19 cysteine residues. Images PMID:3110773

  5. Complete cDNA and derived amino acid sequence of human factor V

    SciTech Connect

    Jenny, R.J.; Pittman, D.D.; Toole, J.J.; Kriz, R.W.; Aldape, R.A.; Hewick, R.M.; Kaufman, R.J.; Mann, K.G.

    1987-07-01

    cDNA clones encoding human factor V have been isolated from an oligo(dT)-primed human fetal liver cDNA library prepared with vector Charon 21A. The cDNA sequence of factor V from three overlapping clones includes a 6672-base-pair (bp) coding region, a 90-bp 5' untranslated region, and a 163-bp 3' untranslated region within which is a poly(A)tail. The deduced amino acid sequence consists of 2224 amino acids inclusive of a 28-amino acid leader peptide. Direct comparison with human factor VIII reveals considerable homology between proteins in amino acid sequence and domain structure: a triplicated A domain and duplicated C domain show approx. 40% identity with the corresponding domains in factor VIII. As in factor VIII, the A domains of factor V share approx. 40% amino acid-sequence homology with the three highly conserved domains in ceruloplasmin. The B domain of factor V contains 35 tandem and approx. 9 additional semiconserved repeats of nine amino acids of the form Asp-Leu-Ser-Gln-Thr-Thr/Asn-Leu-Ser-Pro and 2 additional semiconserved repeats of 17 amino acids. Factor V contains 37 potential N-linked glycosylation sites, 25 of which are in the B domain, and a total of 19 cysteine residues.

  6. Transcriptome Sequencing in Response to Salicylic Acid in Salvia miltiorrhiza.

    PubMed

    Zhang, Xiaoru; Dong, Juane; Liu, Hailong; Wang, Jiao; Qi, Yuexin; Liang, Zongsuo

    2016-01-01

    Salvia miltiorrhiza is a traditional Chinese herbal medicine, whose quality and yield are often affected by diseases and environmental stresses during its growing season. Salicylic acid (SA) plays a significant role in plants responding to biotic and abiotic stresses, but the involved regulatory factors and their signaling mechanisms are largely unknown. In order to identify the genes involved in SA signaling, the RNA sequencing (RNA-seq) strategy was employed to evaluate the transcriptional profiles in S. miltiorrhiza cell cultures. A total of 50,778 unigenes were assembled, in which 5,316 unigenes were differentially expressed among 0-, 2-, and 8-h SA induction. The up-regulated genes were mainly involved in stimulus response and multi-organism process. A core set of candidate novel genes coding SA signaling component proteins was identified. Many transcription factors (e.g., WRKY, bHLH and GRAS) and genes involved in hormone signal transduction were differentially expressed in response to SA induction. Detailed analysis revealed that genes associated with defense signaling, such as antioxidant system genes, cytochrome P450s and ATP-binding cassette transporters, were significantly overexpressed, which can be used as genetic tools to investigate disease resistance. Our transcriptome analysis will help understand SA signaling and its mechanism of defense systems in S. miltiorrhiza. PMID:26808150

  7. Transcriptome Sequencing in Response to Salicylic Acid in Salvia miltiorrhiza

    PubMed Central

    Zhang, Xiaoru; Dong, Juane; Liu, Hailong; Wang, Jiao; Qi, Yuexin; Liang, Zongsuo

    2016-01-01

    Salvia miltiorrhiza is a traditional Chinese herbal medicine, whose quality and yield are often affected by diseases and environmental stresses during its growing season. Salicylic acid (SA) plays a significant role in plants responding to biotic and abiotic stresses, but the involved regulatory factors and their signaling mechanisms are largely unknown. In order to identify the genes involved in SA signaling, the RNA sequencing (RNA-seq) strategy was employed to evaluate the transcriptional profiles in S. miltiorrhiza cell cultures. A total of 50,778 unigenes were assembled, in which 5,316 unigenes were differentially expressed among 0-, 2-, and 8-h SA induction. The up-regulated genes were mainly involved in stimulus response and multi-organism process. A core set of candidate novel genes coding SA signaling component proteins was identified. Many transcription factors (e.g., WRKY, bHLH and GRAS) and genes involved in hormone signal transduction were differentially expressed in response to SA induction. Detailed analysis revealed that genes associated with defense signaling, such as antioxidant system genes, cytochrome P450s and ATP-binding cassette transporters, were significantly overexpressed, which can be used as genetic tools to investigate disease resistance. Our transcriptome analysis will help understand SA signaling and its mechanism of defense systems in S. miltiorrhiza. PMID:26808150

  8. Nucleotide Sequence Analyses and Predicted Coding of Bunyavirus Genome RNA Species

    PubMed Central

    Clerx-van Haaster, Corrie M.; Akashi, Hiroomi; Auperin, David D.; Bishop, David H. L.

    1982-01-01

    We performed 3′ RNA sequence analyses of [32P]pCp-end-labeled La Crosse (LAC) virus, alternate LAC virus isolate L74, and snowshoe hare bunyavirus large (L), medium (M), and small (S) negative-stranded viral RNA species to determine the coding capabilities of these species. These analyses were confirmed by dideoxy primer extension studies in which we used a synthetic oligodeoxynucleotide primer complementary to the conserved 3′-terminal decanucleotide of the three viral RNA species (Clerx-van Haaster and Bishop, Virology 105:564-574, 1980). The deduced sequences predicted translation of two S-RNA gene products that were read in overlapping reading frames. So far, only single contiguous open reading frames have been identified for the viral M- and L-RNA species. For the negative-stranded M-RNA species of all three viruses, the single reading frame developed from the first 3′-proximal UAC triplet. Likewise, for the L-RNA of the alternate LAC isolate, a single open reading frame developed from the first 3′-proximal UAC triplet. The corresponding L-RNA sequences of prototype LAC and snowshoe hare viruses initiated open reading frames; however, for both viral L-RNA species there was a preceding 3′-proximal UAC triplet in another reading frame that was followed shortly afterward by a termination codon. A comparison of the sequence data obtained for snowshoe hare virus, LAC virus, and the alternate LAC virus isolate showed that the identified nucleotide substitutions were sufficient to account for some of the fingerprint differences in the L-, M-, and S-RNA species of the three viruses. Unlike the distribution of the L- and M-RNA substitutions, significantly fewer nucleotide substitutions occurred after the initial UAC triplet of the S-RNA species than before this triplet, implying that the overlapping genes of the S RNA provided a constraint against evolution by point mutation. The comparative sequence analyses predicted amino acid differences among the

  9. SRComp: short read sequence compression using burstsort and Elias omega coding.

    PubMed

    Selva, Jeremy John; Chen, Xin

    2013-01-01

    Next-generation sequencing (NGS) technologies permit the rapid production of vast amounts of data at low cost. Economical data storage and transmission hence becomes an increasingly important challenge for NGS experiments. In this paper, we introduce a new non-reference based read sequence compression tool called SRComp. It works by first employing a fast string-sorting algorithm called burstsort to sort read sequences in lexicographical order and then Elias omega-based integer coding to encode the sorted read sequences. SRComp has been benchmarked on four large NGS datasets, where experimental results show that it can run 5-35 times faster than current state-of-the-art read sequence compression tools such as BEETL and SCALCE, while retaining comparable compression efficiency for large collections of short read sequences. SRComp is a read sequence compression tool that is particularly valuable in certain applications where compression time is of major concern.

  10. Amino acid codes in mitochondria as possible clues to primitive codes

    NASA Technical Reports Server (NTRS)

    Jukes, T. H.

    1981-01-01

    Differences between mitochondrial codes and the universal code indicate that an evolutionary simplification has taken place, rather than a return to a more primitive code. However, these differences make it evident that the universal code is not the only code possible, and therefore earlier codes may have differed markedly from the previous code. The present universal code is probably a 'frozen accident.' The change in CUN codons from leucine to threonine (Neurospora vs. yeast mitochondria) indicates that neutral or near-neutral changes occurred in the corresponding proteins when this code change took place, caused presumably by a mutation in a tRNA gene.

  11. Segments of amino acid sequence similarity in beta-amylases.

    PubMed

    Friedberg, F; Rhodes, C

    1988-01-01

    In alpha-amylases from animals, plants and bacteria and in beta-amylases from plants and bacteria a number of segments exhibit amino acid sequence similarity specific to the alpha or to the beta type, respectively. In the case of the beta-amylases the similar sequence regions are extensive and they are disrupted only by short interspersed dissimilar regions. Close to the C terminus, however, no such sequence similarity exist. PMID:2464171

  12. Identification of a conserved sequence in the non-coding regions of many human genes.

    PubMed Central

    Donehower, L A; Slagle, B L; Wilde, M; Darlington, G; Butel, J S

    1989-01-01

    We have analyzed a sequence of approximately 70 base pairs (bp) that shows a high degree of similarity to sequences present in the non-coding regions of a number of human and other mammalian genes. The sequence was discovered in a fragment of human genomic DNA adjacent to an integrated hepatitis B virus genome in cells derived from human hepatocellular carcinoma tissue. When one of the viral flanking sequences was compared to nucleotide sequences in GenBank, more than thirty human genes were identified that contained a similar sequence in their non-coding regions. The sequence element was usually found once or twice in a gene, either in an intron or in the 5' or 3' flanking regions. It did not share any similarities with known short interspersed nucleotide elements (SINEs) or presently known gene regulatory elements. This element was highly conserved at the same position within the corresponding human and mouse genes for myoglobin and N-myc, indicating evolutionary conservation and possible functional importance. Preliminary DNase I footprinting data suggested that the element or its adjacent sequences may bind nuclear factors to generate specific DNase I hypersensitive sites. The size, structure, and evolutionary conservation of this sequence indicates that it is distinct from other types of short interspersed repetitive elements. It is possible that the element may have a cis-acting functional role in the genome. Images PMID:2536922

  13. An SNR improvement of passive SAW tags with 5-bit Barker code sequence

    NASA Astrophysics Data System (ADS)

    Bae, Hyunchul; Kim, Jaekwon; Burm, Jinwook

    2012-07-01

    Passive surface acoustic wave (SAW) tags require a large signal-to-noise ratio (SNR) in order to increase the interrogation range. For the purpose of achieving high SNR for radio frequency identification (RFID) communication systems, Barker codes, a binary phase shift keying (BPSK) modulation technique, have been adopted in this study. Passive SAW RFID tags were designed with 5-bit Barker code sequences to generate BPSK modulated signals. Through the SNR analysis, the improvements in SNR were about 11 dB using Barker codes along with a correlator, which can be further improved by optimisation in the correlator.

  14. A lossless compression method for medical image sequences using JPEG-LS and interframe coding.

    PubMed

    Miaou, Shaou-Gang; Ke, Fu-Sheng; Chen, Shu-Ching

    2009-09-01

    Hospitals and medical centers produce an enormous amount of digital medical images every day, especially in the form of image sequences, which requires considerable storage space. One solution could be the application of lossless compression. Among available methods, JPEG-LS has excellent coding performance. However, it only compresses a single picture with intracoding and does not utilize the interframe correlation among pictures. Therefore, this paper proposes a method that combines the JPEG-LS and an interframe coding with motion vectors to enhance the compression performance of using JPEG-LS alone. Since the interframe correlation between two adjacent images in a medical image sequence is usually not as high as that in a general video image sequence, the interframe coding is activated only when the interframe correlation is high enough. With six capsule endoscope image sequences under test, the proposed method achieves average compression gains of 13.3% and 26.3% over the methods of using JPEG-LS and JPEG2000 alone, respectively. Similarly, for an MRI image sequence, coding gains of 77.5% and 86.5% are correspondingly obtained.

  15. Nucleotide deletion and P addition in V(D)J recombination: a determinant role of the coding-end sequence.

    PubMed Central

    Nadel, B; Feeney, A J

    1997-01-01

    During V(D)J recombination, the coding ends to be joined are extensively modified. Those modifications, termed coding-end processing, consist of removal and addition of various numbers of nucleotides. We previously showed in vivo that coding-end processing is specific for each coding end, suggesting that specific motifs in a coding-end sequence influence nucleotide deletion and P-region formation. In this study, we created a panel of recombination substrates containing actual immunoglobulin and T-cell receptor coding-end sequences and dissected the role of each motif by comparing its processing pattern with those of variants containing minimal nucleotide changes from the original sequence. Our results demonstrate the determinant role of specific sequence motifs on coding-end processing and also the importance of the context in which they are found. We show that minimal nucleotide changes in key positions of a coding-end sequence can result in dramatic changes in the processing pattern. We propose that each coding-end sequence dictates a unique hairpin structure, the result of a particular energy conformation between nucleotides organizing the loop and the stem, and that the interplay between this structure and specific sequence motifs influences the frequency and location of nicks which open the coding-end hairpin. These findings indicate that the sequences of the coding ends determine their own processing and have a profound impact on the development of the primary B- and T-cell repertoires. PMID:9199310

  16. Coding-complete sequencing classifies parrot bornavirus 5 into a novel virus species.

    PubMed

    Marton, Szilvia; Bányai, Krisztián; Gál, János; Ihász, Katalin; Kugler, Renáta; Lengyel, György; Jakab, Ferenc; Bakonyi, Tamás; Farkas, Szilvia L

    2015-11-01

    In this study, we determined the sequence of the coding region of an avian bornavirus detected in a blue-and-yellow macaw (Ara ararauna) with pathological/histopathological changes characteristic of proventricular dilatation disease. The genomic organization of the macaw bornavirus is similar to that of other bornaviruses, and its nucleotide sequence is nearly identical to the available partial parrot bornavirus 5 (PaBV-5) sequences. Phylogenetic analysis showed that these strains formed a monophyletic group distinct from other mammalian and avian bornaviruses and in calculations performed with matrix protein coding sequences, the PaBV-5 and PaBV-6 genotypes formed a common cluster, suggesting that according to the recently accepted classification system for bornaviruses, these two genotypes may belong to a new species, provisionally named Psittaciform 2 bornavirus.

  17. Severe accident source term characteristics for selected Peach Bottom sequences predicted by the MELCOR Code

    SciTech Connect

    Carbajo, J.J.

    1993-09-01

    The purpose of this report is to compare in-containment source terms developed for NUREG-1159, which used the Source Term Code Package (STCP), with those generated by MELCOR to identify significant differences. For this comparison, two short-term depressurized station blackout sequences (with a dry cavity and with a flooded cavity) and a Loss-of-Coolant Accident (LOCA) concurrent with complete loss of the Emergency Core Cooling System (ECCS) were analyzed for the Peach Bottom Atomic Power Station (a BWR-4 with a Mark I containment). The results indicate that for the sequences analyzed, the two codes predict similar total in-containment release fractions for each of the element groups. However, the MELCOR/CORBH Package predicts significantly longer times for vessel failure and reduced energy of the released material for the station blackout sequences (when compared to the STCP results). MELCOR also calculated smaller releases into the environment than STCP for the station blackout sequences.

  18. Is there an error correcting code in the base sequence in DNA?

    PubMed Central

    Liebovitch, L S; Tao, Y; Todorov, A T; Levine, L

    1996-01-01

    Modern methods of encoding information into digital form include error check digits that are functions of the other information digits. When digital information is transmitted, the values of the error check digits can be computed from the information digits to determine whether the information has been received accurately. These error correcting codes make it possible to detect and correct common errors in transmission. The sequence of bases in DNA is also a digital code consisting of four symbols: A, C, G, and T. Does DNA also contain an error correcting code? Such a code would allow repair enzymes to protect the fidelity of nonreplicating DNA and increase the accuracy of replication. If a linear block error correcting code is present in DNA then some bases would be a linear function of the other bases in each set of bases. We developed an efficient procedure to determine whether such an error correcting code is present in the base sequence. We illustrate the use of this procedure by using it to analyze the lac operon and the gene for cytochrome c. These genes do not appear to contain such a simple error correcting code. PMID:8874027

  19. Concerted evolution at a multicopy locus in the protozoan parasite Theileria parva: extreme divergence of potential protein-coding sequences.

    PubMed Central

    Bishop, R; Musoke, A; Morzaria, S; Sohanpal, B; Gobright, E

    1997-01-01

    Concerted evolution of multicopy gene families in vertebrates is recognized as an important force in the generation of biological novelty but has not been documented for the multicopy genes of protozoa. A multicopy locus, Tpr, which consists of tandemly arrayed open reading frames (ORFs) containing several repeated elements has been described for Theileria parva. Herein we show that probes derived from the 5'/N-terminal ends of ORFs in the genomic DNAs of T. parva Uganda (1,108 codons) and Boleni (699 codons) hybridized with multicopy sequences in homologous DNA but did not detect similar sequences in the DNA of 14 heterologous T. parva stocks and clones. The probe sequences were, however, protein coding according to predictive algorithms and codon usage. The 3'/C-terminal ends of the Uganda and Boleni ORFs exhibited 75% similarity and identity, respectively, to the previously identified Tpr1 and Tpr2 repetitive elements of T. parva Muguga. Tpr1-homologous sequences were detected in two additional species of Theileria. Eight different Tpr1-homologous transcripts were present in piroplasm mRNA from a single T. parva Muguga-infected animal. The Tpr1 and Tpr2 amino acid sequences contained six predicted membrane-associated segments. The ratio of synonymous to nonsynonymous substitutions indicates that Tpr1 evolves like protein-encoding DNA. The previously determined nucleotide sequence of the gene encoding the p67 antigen is completely identical in T. parva Muguga, Boleni, and Uganda, including the third base in codons. The data suggest that concerted evolution can lead to the radical divergence of coding sequences and that this can be a mechanism for the generation of novel genes. PMID:9032293

  20. Molecular cloning and sequence determination of the nuclear gene coding for mitochondrial elongation factor Tu of Saccharomyces cerevisiae.

    PubMed

    Nagata, S; Tsunetsugu-Yokota, Y; Naito, A; Kaziro, Y

    1983-10-01

    A 3.1-kilobase Bgl II fragment of Saccharomyces cerevisiae carrying the nuclear gene encoding the mitochondrial polypeptide chain elongation factor (EF) Tu has been cloned on pBR327 to yield a chimeric plasmid pYYB. The identification of the gene designated as tufM was based on the cross-hybridization with the Escherichia coli tufB gene, under low stringency conditions. The complete nucleotide sequence of the yeast tufM gene was established together with its 5'- and 3'-flanking regions. The sequence contained 1,311 nucleotides coding for a protein of 437 amino acids with a calculated Mr of 47,980. The nucleotide sequence and the deduced amino acid sequence of tufM were 60% and 66% homologous, respectively, to the corresponding sequences of E. coli tufA, when aligned to obtain the maximal homology. Plasmid YRpYB was then constructed by cloning the 2.5-kilobase EcoRI fragment of pYYB carrying tufM into a yeast cloning vector YRp-7. A mRNA hybridizable with tufM was isolated from the total mRNA of S. cerevisiae D13-1A transformed with YRpYB and translated in the reticulocyte lysate. The mRNA could direct the synthesis of a protein with Mr 48,000, which was immunoprecipitated with an anti-E. coli EF-Tu antibody but not with an antibody against yeast cytoplasmic EF-1 alpha. The results indicate that the tufM gene is a nuclear gene coding for the yeast mitochondrial EF-Tu. PMID:6353412

  1. Biosynthesis of riboflavin: cloning, sequencing, and expression of the gene coding for 3,4-dihydroxy-2-butanone 4-phosphate synthase of Escherichia coli.

    PubMed Central

    Richter, G; Volk, R; Krieger, C; Lahm, H W; Röthlisberger, U; Bacher, A

    1992-01-01

    3,4-Dihydroxy-2-butanone 4-phosphate is biosynthesized from ribulose 5-phosphate and serves as the biosynthetic precursor for the xylene ring of riboflavin. The gene coding for 3,4-dihydroxy-2-butanone 4-phosphate synthase of Escherichia coli has been cloned and sequenced. The gene codes for a protein of 217 amino acid residues with a calculated molecular mass of 23,349.6 Da. The enzyme was purified to near homogeneity from a recombinant E. coli strain and had a specific activity of 1,700 nmol mg-1 h-1. The N-terminal amino acid sequence and the amino acid composition of the protein were in agreement with the deduced sequence. The molecular mass as determined by ion spray mass spectrometry was 23,351 +/- 2 Da, which is in agreement with the predicted mass. The previously reported loci htrP, "luxH-like," and ribB at 66 min of the E. coli chromosome are all identical to the gene coding for 3,4-dihydroxy-2-butanone 4-phosphate synthase, but their role had not been hitherto determined. Sequence homology indicates that gene luxH of Vibrio harveyi and the central open reading frame of the Bacillus subtilis riboflavin operon code for 3,4-dihydroxy-2-butanone 4-phosphate synthase. Images PMID:1597419

  2. Applications for protein sequence-function evolution data: mRNA/protein expression analysis and coding SNP scoring tools.

    PubMed

    Thomas, Paul D; Kejariwal, Anish; Guo, Nan; Mi, Huaiyu; Campbell, Michael J; Muruganujan, Anushya; Lazareva-Ulitsky, Betty

    2006-07-01

    The vast amount of protein sequence data now available, together with accumulating experimental knowledge of protein function, enables modeling of protein sequence and function evolution. The PANTHER database was designed to model evolutionary sequence-function relationships on a large scale. There are a number of applications for these data, and we have implemented web services that address three of them. The first is a protein classification service. Proteins can be classified, using only their amino acid sequences, to evolutionary groups at both the family and subfamily levels. Specific subfamilies, and often families, are further classified when possible according to their functions, including molecular function and the biological processes and pathways they participate in. The second application, then, is an expression data analysis service, where functional classification information can help find biological patterns in the data obtained from genome-wide experiments. The third application is a coding single-nucleotide polymorphism scoring service. In this case, information about evolutionarily related proteins is used to assess the likelihood of a deleterious effect on protein function arising from a single substitution at a specific amino acid position in the protein. All three web services are available at http://www.pantherdb.org/tools.

  3. Purifying selection shapes the coincident SNP distribution of primate coding sequences.

    PubMed

    Chen, Chia-Ying; Hung, Li-Yuan; Wu, Chan-Shuo; Chuang, Trees-Juen

    2016-01-01

    Genome-wide analysis has observed an excess of coincident single nucleotide polymorphisms (coSNPs) at human-chimpanzee orthologous positions, and suggested that this is due to cryptic variation in the mutation rate. While this phenomenon primarily corresponds with non-coding coSNPs, the situation in coding sequences remains unclear. Here we calculate the observed-to-expected ratio of coSNPs (coSNPO/E) to estimate the prevalence of human-chimpanzee coSNPs, and show that the excess of coSNPs is also present in coding regions. Intriguingly, coSNPO/E is much higher at zero-fold than at nonzero-fold degenerate sites; such a difference is due to an elevation of coSNPO/E at zero-fold degenerate sites, rather than a reduction at nonzero-fold degenerate ones. These trends are independent of chimpanzee subpopulation, population size, or sequencing techniques; and hold in broad generality across primates. We find that this discrepancy cannot fully explained by sequence contexts, shared ancestral polymorphisms, SNP density, and recombination rate, and that coSNPO/E in coding sequences is significantly influenced by purifying selection. We also show that selection and mutation rate affect coSNPO/E independently, and coSNPs tend to be less damaging and more correlated with human diseases than non-coSNPs. These suggest that coSNPs may represent a "signature" during primate protein evolution. PMID:27255481

  4. Complete coding sequence of Zika virus from Martinique outbreak in 2015.

    PubMed

    Piorkowski, G; Richard, P; Baronti, C; Gallian, P; Charrel, R; Leparc-Goffart, I; de Lamballerie, X

    2016-05-01

    Zika virus is an Aedes-borne Flavivirus causing fever, arthralgia, myalgia rash, associated with Guillain-Barré syndrome and suspected to induce microcephaly in the fetus. We report here the complete coding sequence of the first characterized Caribbean Zika virus strain, isolated from a patient from Martinique in December, 2015. PMID:27274849

  5. First Complete Coding Sequence of a Spanish Isolate of Swine Vesicular Disease Virus.

    PubMed

    Vázquez-Calvo, Ángela; Saiz, Juan-Carlos; Sobrino, Francisco; Martín-Acebes, Miguel A

    2016-03-03

    Swine vesicular disease virus (SVDV) is a porcine pathogen and a member of the Enterovirus genus within the Picornaviridae family. The SVDV genome is composed of a single-stranded RNA molecule of positive polarity. Here, we report the first complete sequence of the coding region of a Spanish SVDV isolate (SPA/1/'93).

  6. Purifying selection shapes the coincident SNP distribution of primate coding sequences

    PubMed Central

    Chen, Chia-Ying; Hung, Li-Yuan; Wu, Chan-Shuo; Chuang, Trees-Juen

    2016-01-01

    Genome-wide analysis has observed an excess of coincident single nucleotide polymorphisms (coSNPs) at human-chimpanzee orthologous positions, and suggested that this is due to cryptic variation in the mutation rate. While this phenomenon primarily corresponds with non-coding coSNPs, the situation in coding sequences remains unclear. Here we calculate the observed-to-expected ratio of coSNPs (coSNPO/E) to estimate the prevalence of human-chimpanzee coSNPs, and show that the excess of coSNPs is also present in coding regions. Intriguingly, coSNPO/E is much higher at zero-fold than at nonzero-fold degenerate sites; such a difference is due to an elevation of coSNPO/E at zero-fold degenerate sites, rather than a reduction at nonzero-fold degenerate ones. These trends are independent of chimpanzee subpopulation, population size, or sequencing techniques; and hold in broad generality across primates. We find that this discrepancy cannot fully explained by sequence contexts, shared ancestral polymorphisms, SNP density, and recombination rate, and that coSNPO/E in coding sequences is significantly influenced by purifying selection. We also show that selection and mutation rate affect coSNPO/E independently, and coSNPs tend to be less damaging and more correlated with human diseases than non-coSNPs. These suggest that coSNPs may represent a “signature” during primate protein evolution. PMID:27255481

  7. Complete coding sequence of zika virus from a French polynesia outbreak in 2013.

    PubMed

    Baronti, Cécile; Piorkowski, Géraldine; Charrel, Rémi N; Boubis, Laetitia; Leparc-Goffart, Isabelle; de Lamballerie, Xavier

    2014-01-01

    Zika virus is an arthropod-borne Flavivirus member of the Spondweni serocomplex, transmitted by Aedes mosquitoes. We report here the complete coding sequence of a Zika virus strain belonging to the Asian lineage, isolated from an infected patient returning from French Polynesia, an epidemic area in 2013/2014.

  8. Complete coding sequence of Zika virus from Martinique outbreak in 2015.

    PubMed

    Piorkowski, G; Richard, P; Baronti, C; Gallian, P; Charrel, R; Leparc-Goffart, I; de Lamballerie, X

    2016-05-01

    Zika virus is an Aedes-borne Flavivirus causing fever, arthralgia, myalgia rash, associated with Guillain-Barré syndrome and suspected to induce microcephaly in the fetus. We report here the complete coding sequence of the first characterized Caribbean Zika virus strain, isolated from a patient from Martinique in December, 2015.

  9. Complete Coding Genome Sequence of a Putative Novel Teschovirus Serotype 12 Strain

    PubMed Central

    Jiménez-Clavero, M. A.

    2016-01-01

    Porcine teschoviruses are ubiquitous and prevalent viruses generally harmless to their hosts, the suids. Here, we report the first complete coding genome sequence of a putative new serotype of porcine teschovirus (PTV-12), strain CC25, isolated from fecal material from a healthy pig in Spain. PMID:26966207

  10. POLYMORPHISM IN THE CODING REGION SEQUENCE OF GDF8 GENE IN INDIAN SHEEP.

    PubMed

    Pothuraju, M; Mishra, S K; Kumar, S N; Mohamed, N F; Kataria, R S; Yadav, D K; Arora, R

    2015-11-01

    The present study was undertaken to identify polymorphism in the coding sequence of GDF8gene across indigenous meat type sheep breeds. A 1647 bp sequence was generated, encompassing 208 bp of the 5'UTR, 1128 bp of coding region (exon1, 2 and 3) as well as 311 bp of 3'UTR. The sheep and goat GDF8 gene sequences were observed to be highly conserved as compared to cattle, buffalo, horse and pig. Several nucleotide variations were observed across coding sequence of GDF8 gene in Indian sheep. Three polymorphic sites were identified in the 5'UTR, one in exon 1 and one in the exon 2 regions. Both SNPs in the exonic region were found to be non-synonymous. The mutations c.539T > G and c.821T > A discovered in this study in the exon 1 and exon 2, respectively, have not been previously reported. The information generated provides preliminary indication of the functional diversity present in Indian sheep at the coding region of GDF8gene. The novel as well as the previously reported SNPs discovered in the Indian sheep warrant further analysis to see whether they affect the phenotype. Future studies will need to establish the affect of reported SNPs in the expression of the GDF8 gene in Indian sheep population. PMID:26845859

  11. First Complete Coding Sequence of a Spanish Isolate of Swine Vesicular Disease Virus

    PubMed Central

    Vázquez-Calvo, Ángela; Saiz, Juan-Carlos; Martín-Acebes, Miguel A.

    2016-01-01

    Swine vesicular disease virus (SVDV) is a porcine pathogen and a member of the Enterovirus genus within the Picornaviridae family. The SVDV genome is composed of a single-stranded RNA molecule of positive polarity. Here, we report the first complete sequence of the coding region of a Spanish SVDV isolate (SPA/1/'93). PMID:26941157

  12. Successful Recovery of Nuclear Protein-Coding Genes from Small Insects in Museums Using Illumina Sequencing

    PubMed Central

    Dasenko, Mark A.

    2015-01-01

    In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles

  13. Successful Recovery of Nuclear Protein-Coding Genes from Small Insects in Museums Using Illumina Sequencing.

    PubMed

    Kanda, Kojun; Pflug, James M; Sproul, John S; Dasenko, Mark A; Maddison, David R

    2015-01-01

    In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles

  14. Bovine dopamine beta-hydroxylase cDNA. Complete coding sequence and expression in mammalian cells with vaccinia virus vector.

    PubMed

    Lewis, E J; Allison, S; Fader, D; Claflin, V; Baizer, L

    1990-01-15

    We have isolated cDNA clones for bovine dopamine beta-hydroxylase from an adrenal medulla cDNA library and have determined the complete coding sequence. The largest cDNA clone isolated from the library is 2.4 kilobase pairs (kb) and contains an open reading frame of 1788 bases, coding for a protein of 597 amino acids and Mr = 66,803. The predicted amino acid sequence of the bovine cDNA contains 85% identity with human dopamine beta-hydroxylase (Lamouroux, A., Vingny, A., Faucon Biquet, N., Darmon, M. C., Franck, R., Henry, J.P., and Mallet, J. (1987) EMBO J. 6, 3931-3937; Kobayashi, K., Kurosawa, Y., Fujita, K., and Nagatsu, T. (1989) Nucleic Acids Res. 17, 1089-1102). Northern blot analysis reveals that the cDNA hybridizes to an mRNA of 2.4 kb present in bovine adrenal medulla, but not in kidney, heart, or liver. In addition, the cDNA hybridizes to a second RNA species of 5.5 kb, which is 4-fold less abundant than the 2.4-kb RNA. In vitro translation of a synthetic RNA transcribed from the 2.4-kb cDNA produces a 68-kDa protein, which is specifically immunoprecipitated by antiserum to bovine dopamine beta-hydroxylase. The 2.4-kb cDNA was cloned into a vaccinia virus vector, and the recombinant virus was used to infect the rat pheochromocytoma PC12 and monkey BSC-40 fibroblast cell lines. In both cell lines, infection with recombinant virus produces a protein of Mr = 75,000, which reacts with antiserum to bovine dopamine beta-hydroxylase. These results indicate that the 2.4-kb cDNA contains the genetic information necessary to code for the bovine dopamine beta-hydroxylase subunit.

  15. Systematic analysis of coding and noncoding DNA sequences using methods of statistical linguistics

    NASA Technical Reports Server (NTRS)

    Mantegna, R. N.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1995-01-01

    We compare the statistical properties of coding and noncoding regions in eukaryotic and viral DNA sequences by adapting two tests developed for the analysis of natural languages and symbolic sequences. The data set comprises all 30 sequences of length above 50 000 base pairs in GenBank Release No. 81.0, as well as the recently published sequences of C. elegans chromosome III (2.2 Mbp) and yeast chromosome XI (661 Kbp). We find that for the three chromosomes we studied the statistical properties of noncoding regions appear to be closer to those observed in natural languages than those of coding regions. In particular, (i) a n-tuple Zipf analysis of noncoding regions reveals a regime close to power-law behavior while the coding regions show logarithmic behavior over a wide interval, while (ii) an n-gram entropy measurement shows that the noncoding regions have a lower n-gram entropy (and hence a larger "n-gram redundancy") than the coding regions. In contrast to the three chromosomes, we find that for vertebrates such as primates and rodents and for viral DNA, the difference between the statistical properties of coding and noncoding regions is not pronounced and therefore the results of the analyses of the investigated sequences are less conclusive. After noting the intrinsic limitations of the n-gram redundancy analysis, we also briefly discuss the failure of the zeroth- and first-order Markovian models or simple nucleotide repeats to account fully for these "linguistic" features of DNA. Finally, we emphasize that our results by no means prove the existence of a "language" in noncoding DNA.

  16. Long-range correlation properties of coding and noncoding DNA sequences: GenBank analysis

    NASA Technical Reports Server (NTRS)

    Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Matsa, M. E.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1995-01-01

    An open question in computational molecular biology is whether long-range correlations are present in both coding and noncoding DNA or only in the latter. To answer this question, we consider all 33301 coding and all 29453 noncoding eukaryotic sequences--each of length larger than 512 base pairs (bp)--in the present release of the GenBank to dtermine whether there is any statistically significant distinction in their long-range correlation properties. Standard fast Fourier transform (FFT) analysis indicates that coding sequences have practically no correlations in the range from 10 bp to 100 bp (spectral exponent beta=0.00 +/- 0.04, where the uncertainty is two standard deviations). In contrast, for noncoding sequences, the average value of the spectral exponent beta is positive (0.16 +/- 0.05) which unambiguously shows the presence of long-range correlations. We also separately analyze the 874 coding and the 1157 noncoding sequences that have more than 4096 bp and find a larger region of power-law behavior. We calculate the probability that these two data sets (coding and noncoding) were drawn from the same distribution and we find that it is less than 10(-10). We obtain independent confirmation of these findings using the method of detrended fluctuation analysis (DFA), which is designed to treat sequences with statistical heterogeneity, such as DNA's known mosaic structure ("patchiness") arising from the nonstationarity of nucleotide concentration. The near-perfect agreement between the two independent analysis methods, FFT and DFA, increases the confidence in the reliability of our conclusion.

  17. Systematic analysis of coding and noncoding DNA sequences using methods of statistical linguistics

    NASA Astrophysics Data System (ADS)

    Mantegna, R. N.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C.-K.; Simons, M.; Stanley, H. E.

    1995-09-01

    We compare the statistical properties of coding and noncoding regions in eukaryotic and viral DNA sequences by adapting two tests developed for the analysis of natural languages and symbolic sequences. The data set comprises all 30 sequences of length above 50 000 base pairs in GenBank Release No. 81.0, as well as the recently published sequences of C.elegans chromosome III (2.2 Mbp) and yeast chromosome XI (661 Kbp). We find that for the three chromosomes we studied the statistical properties of noncoding regions appear to be closer to those observed in natural languages than those of the coding regions. In particular, (i) an n-tuple Zipf analysis of noncoding regions reveals a regime close to power-law behavior while the coding regions show logarithmic behavior over a wide interval, while (ii) an n-gram entropy measurement shows that the noncoding regions have a lower n-gram entropy (and hence a larger ``n-gram redundancy'') than the coding regions. In contrast to the three chromosomes, we find that for vertebrates-such as primates and rodents-and for viral DNA, the difference between the statistical properties of coding and noncoding regions is not pronounced and therefore the results of the analyses of the investigated sequences are less conclusive. After noting the intrinsic limitations of the n-gram redundancy analysis, we also briefly discuss the failure of zero- and first-order Markovian models or simple nucleotide repeats to account fully for these ``linguistic'' features of DNA. Finally, we emphasize that our results by no means prove the existence of a ``language'' in noncoding DNA.

  18. Statistical analysis of nucleotide runs in coding and noncoding DNA sequences.

    PubMed

    Sprizhitsky YuA; Nechipurenko YuD; Alexandrov, A A; Volkenstein, M V

    1988-10-01

    A statistical analysis of the occurrence of particular nucleotide runs in DNA sequences of different species has been carried out. There are considerable differences of run distributions in DNA sequences of procaryotes, invertebrates and vertebrates. There is an abundance of short runs (1-2 nucleotides long) in the coding sequences and there is a deficiency of such runs in the noncoding regions. However, some interesting exceptions from this rule exist for the run distribution of adenine in procaryotes and for the arrangement of purine-pyrimidine runs in eucaryotes. The similarity in the distributions of such runs in the coding and noncoding regions may be due to some structural features of the DNA molecule as a whole. Runs of guanine (or cytosine) of three to six nucleotides occur predominantly in noncoding DNA regions in eucaryotes, especially in vertebrates.

  19. Amino acid sequences of proteins from Leptospira serovar pomona.

    PubMed

    Alves, S F; Lefebvre, R B; Probert, W

    2000-01-01

    This report describes a partial amino acid sequences from three putative outer envelope proteins from Leptospira serovar pomona. In order to obtain internal fragments for protein sequencing, enzymatic and chemical digestion was performed. The enzyme clostripain was used to digest the proteins 32 and 45 kDa. In situ digestion of 40 kDa molecular weight protein was accomplished using cyanogen bromide. The 32 kDa protein generated two fragments, one of 21 kDa and another of 10 kDa that yielded five residues. A fragment of 24 kDa that yielded nineteen residues of amino acids was obtained from 45 kDa protein. A fragment with a molecular weight of 20 kDa, yielding a twenty amino acids sequence from the 40 kDa protein.

  20. Rapid quantification of mutant fitness in diverse bacteria by sequencing randomly bar-coded transposons

    SciTech Connect

    Wetmore, Kelly M.; Price, Morgan N.; Waters, Robert J.; Lamson, Jacob S.; He, Jennifer; Hoover, Cindi A.; Blow, Matthew J.; Bristow, James; Butland, Gareth; Arkin, Adam P.; Deutschbauer, Adam

    2015-05-12

    Transposon mutagenesis with next-generation sequencing (TnSeq) is a powerful approach to annotate gene function in bacteria, but existing protocols for TnSeq require laborious preparation of every sample before sequencing. Thus, the existing protocols are not amenable to the throughput necessary to identify phenotypes and functions for the majority of genes in diverse bacteria. Here, we present a method, random bar code transposon-site sequencing (RB-TnSeq), which increases the throughput of mutant fitness profiling by incorporating random DNA bar codes into Tn5 and mariner transposons and by using bar code sequencing (BarSeq) to assay mutant fitness. RB-TnSeq can be used with any transposon, and TnSeq is performed once per organism instead of once per sample. Each BarSeq assay requires only a simple PCR, and 48 to 96 samples can be sequenced on one lane of an Illumina HiSeq system. We demonstrate the reproducibility and biological significance of RB-TnSeq with Escherichia coli, Phaeobacter inhibens, Pseudomonas stutzeri, Shewanella amazonensis, and Shewanella oneidensis. To demonstrate the increased throughput of RB-TnSeq, we performed 387 successful genome-wide mutant fitness assays representing 130 different bacterium-carbon source combinations and identified 5,196 genes with significant phenotypes across the five bacteria. In P. inhibens, we used our mutant fitness data to identify genes important for the utilization of diverse carbon substrates, including a putative D-mannose isomerase that is required for mannitol catabolism. RB-TnSeq will enable the cost-effective functional annotation of diverse bacteria using mutant fitness profiling. A large challenge in microbiology is the functional assessment of the millions of uncharacterized genes identified by genome sequencing. Transposon mutagenesis coupled to next-generation sequencing (TnSeq) is a powerful approach to assign phenotypes and functions to genes

  1. Rapid quantification of mutant fitness in diverse bacteria by sequencing randomly bar-coded transposons

    DOE PAGES

    Wetmore, Kelly M.; Price, Morgan N.; Waters, Robert J.; Lamson, Jacob S.; He, Jennifer; Hoover, Cindi A.; Blow, Matthew J.; Bristow, James; Butland, Gareth; Arkin, Adam P.; et al

    2015-05-12

    Transposon mutagenesis with next-generation sequencing (TnSeq) is a powerful approach to annotate gene function in bacteria, but existing protocols for TnSeq require laborious preparation of every sample before sequencing. Thus, the existing protocols are not amenable to the throughput necessary to identify phenotypes and functions for the majority of genes in diverse bacteria. Here, we present a method, random bar code transposon-site sequencing (RB-TnSeq), which increases the throughput of mutant fitness profiling by incorporating random DNA bar codes into Tn5 and mariner transposons and by using bar code sequencing (BarSeq) to assay mutant fitness. RB-TnSeq can be used with anymore » transposon, and TnSeq is performed once per organism instead of once per sample. Each BarSeq assay requires only a simple PCR, and 48 to 96 samples can be sequenced on one lane of an Illumina HiSeq system. We demonstrate the reproducibility and biological significance of RB-TnSeq with Escherichia coli, Phaeobacter inhibens, Pseudomonas stutzeri, Shewanella amazonensis, and Shewanella oneidensis. To demonstrate the increased throughput of RB-TnSeq, we performed 387 successful genome-wide mutant fitness assays representing 130 different bacterium-carbon source combinations and identified 5,196 genes with significant phenotypes across the five bacteria. In P. inhibens, we used our mutant fitness data to identify genes important for the utilization of diverse carbon substrates, including a putative D-mannose isomerase that is required for mannitol catabolism. RB-TnSeq will enable the cost-effective functional annotation of diverse bacteria using mutant fitness profiling. A large challenge in microbiology is the functional assessment of the millions of uncharacterized genes identified by genome sequencing. Transposon mutagenesis coupled to next-generation sequencing (TnSeq) is a powerful approach to assign phenotypes and functions to genes. However, the current strategies for TnSeq are

  2. In silico comparative analysis of DNA and amino acid sequences for prion protein gene.

    PubMed

    Kim, Y; Lee, J; Lee, C

    2008-01-01

    Genetic variability might contribute to species specificity of prion diseases in various organisms. In this study, structures of the prion protein gene (PRNP) and its amino acids were compared among species of which sequence data were available. Comparisons of PRNP DNA sequences among 12 species including human, chimpanzee, monkey, bovine, ovine, dog, mouse, rat, wallaby, opossum, chicken and zebrafish allowed us to identify candidate regulatory regions in intron 1 and 3'-untranslated region (UTR) in addition to the coding region. Highly conserved putative binding sites for transcription factors, such as heat shock factor 2 (HSF2) and myocite enhancer factor 2 (MEF2), were discovered in the intron 1. In 3'-UTR, the functional sequence (ATTAAA) for nucleus-specific polyadenylation was found in all the analysed species. The functional sequence (TTTTTAT) for maturation-specific polyadenylation was identically observed only in ovine, and one or two nucleotide mismatches in the other species. A comparison of the amino acid sequences in 53 species revealed a large sequence identity. Especially the octapeptide repeat region was observed in all the species but frog and zebrafish. Functional changes and susceptibility to prion diseases with various isoforms of prion protein could be caused by numeric variability and conformational changes discovered in the repeat sequences.

  3. The amino acid sequence of Staphylococcus aureus penicillinase.

    PubMed Central

    Ambler, R P

    1975-01-01

    The amino acid sequence of the penicillinase (penicillin amido-beta-lactamhydrolase, EC 3.5.2.6) from Staphylococcus aureus strain PC1 was determined. The protein consists of a single polypeptide chain of 257 residues, and the sequence was determined by characterization of tryptic, chymotryptic, peptic and CNBr peptides, with some additional evidence from thermolysin and S. aureus proteinase peptides. A mistake in the preliminary report of the sequence is corrected; residues 113-116 are now thought to be -Lys-Lys-Val-Lys- rather than -Lys-Val-Lys-Lys-. Detailed evidence for the amino acid sequence has been deposited as Supplementary Publication SUP 50056 (91 pages) at the British Library (Lending Division), Boston Spa, Wetherby, West Yorkshire LS23 7BQ, U.K., from whom copies may be obtained on the terms given in Biochem. J. (1975) 145, 5. PMID:1218078

  4. Sequence analysis and identification of new variations in the coding sequence of melatonin receptor gene (MTNR1A) of Indian Chokla sheep breed

    PubMed Central

    Saxena, Vijay Kumar; Jha, Bipul Kumar; Meena, Amar Singh; Naqvi, S.M.K.

    2014-01-01

    Melatonin receptor 1A gene is the prime receptor mediating the effect of melatonin at the neuroendocrine level for control of seasonal reproduction in sheep. The aims of this study were to examine the polymorphism pattern of coding sequence of MTNR1A gene in Chokla sheep, a breed of Indian arid tract and to identify new variations in relation to its aseasonal status. Genomic DNAs of 101 Chokla sheep were collected and an 824 bp coding sequence of Exon II was amplified. RFLP was performed with enzyme RsaI and MnlI to assess the presence of polymorphism at position C606T and G612A, respectively. Genotyping revealed significantly higher frequency of M and R alleles than m and r alleles. RR and MM were found to be dominantly present in the group of studied population. Cloning and sequencing of Exon II followed by mutation/polymorphism analysis revealed ten mutations of which three were non-synonymous mutations (G706A, C893A, G931C). G706A leads to substitution of valine by isoleucine Val125I (U14109) in the fifth transmembrane domain. C893A leads to substitution of alanine by aspartic acid in the third extracellular loop. G931C mutation brings about substitution of amino acid alanine by proline in the seventh transmembrane helix, can affect the conformational stability of the molecule. Polyphen-2 analysis revealed that the polymorphism at position 931 is potentially damaging while the mutations at positions 706 and 893 were benign. It is concluded that G931C mutation of MTNR 1A gene, may explain, in part, the importance of melatonin structure integrity in influencing seasonality in sheep. PMID:25606429

  5. Large-scale coding sequence change underlies the evolution of postdevelopmental novelty in honey bees.

    PubMed

    Jasper, William Cameron; Linksvayer, Timothy A; Atallah, Joel; Friedman, Daniel; Chiu, Joanna C; Johnson, Brian R

    2015-02-01

    Whether coding or regulatory sequence change is more important to the evolution of phenotypic novelty is one of biology's major unresolved questions. The field of evo-devo has shown that in early development changes to regulatory regions are the dominant mode of genetic change, but whether this extends to the evolution of novel phenotypes in the adult organism is unclear. Here, we conduct ten RNA-Seq experiments across both novel and conserved tissues in the honey bee to determine to what extent postdevelopmental novelty is based on changes to the coding regions of genes. We make several discoveries. First, we show that with respect to novel physiological functions in the adult animal, positively selected tissue-specific genes of high expression underlie novelty by conferring specialized cellular functions. Such genes are often, but not always taxonomically restricted genes (TRGs). We further show that positively selected genes, whether TRGs or conserved genes, are the least connected genes within gene expression networks. Overall, this work suggests that the evo-devo paradigm is limited, and that the evolution of novelty, postdevelopment, follows additional rules. Specifically, evo-devo stresses that high network connectedness (repeated use of the same gene in many contexts) constrains coding sequence change as it would lead to negative pleiotropic effects. Here, we show that in the adult animal, the converse is true: Genes with low network connectedness (TRGs and tissue-specific conserved genes) underlie novel phenotypes by rapidly changing coding sequence to perform new-specialized functions. PMID:25351750

  6. Large-scale coding sequence change underlies the evolution of postdevelopmental novelty in honey bees.

    PubMed

    Jasper, William Cameron; Linksvayer, Timothy A; Atallah, Joel; Friedman, Daniel; Chiu, Joanna C; Johnson, Brian R

    2015-02-01

    Whether coding or regulatory sequence change is more important to the evolution of phenotypic novelty is one of biology's major unresolved questions. The field of evo-devo has shown that in early development changes to regulatory regions are the dominant mode of genetic change, but whether this extends to the evolution of novel phenotypes in the adult organism is unclear. Here, we conduct ten RNA-Seq experiments across both novel and conserved tissues in the honey bee to determine to what extent postdevelopmental novelty is based on changes to the coding regions of genes. We make several discoveries. First, we show that with respect to novel physiological functions in the adult animal, positively selected tissue-specific genes of high expression underlie novelty by conferring specialized cellular functions. Such genes are often, but not always taxonomically restricted genes (TRGs). We further show that positively selected genes, whether TRGs or conserved genes, are the least connected genes within gene expression networks. Overall, this work suggests that the evo-devo paradigm is limited, and that the evolution of novelty, postdevelopment, follows additional rules. Specifically, evo-devo stresses that high network connectedness (repeated use of the same gene in many contexts) constrains coding sequence change as it would lead to negative pleiotropic effects. Here, we show that in the adult animal, the converse is true: Genes with low network connectedness (TRGs and tissue-specific conserved genes) underlie novel phenotypes by rapidly changing coding sequence to perform new-specialized functions.

  7. The Evolution of Bony Vertebrate Enhancers at Odds with Their Coding Sequence Landscape

    PubMed Central

    Yousaf, Aisha; Sohail Raza, Muhammad; Ali Abbasi, Amir

    2015-01-01

    Enhancers lie at the heart of transcriptional and developmental gene regulation. Therefore, changes in enhancer sequences usually disrupt the target gene expression and result in disease phenotypes. Despite the well-established role of enhancers in development and disease, evolutionary sequence studies are lacking. The current study attempts to unravel the puzzle of bony vertebrates’ conserved noncoding elements (CNE) enhancer evolution. Bayesian phylogenetics of enhancer sequences spotlights promising interordinal relationships among placental mammals, proposing a closer relationship between humans and laurasiatherians while placing rodents at the basal position. Clock-based estimates of enhancer evolution provided a dynamic picture of interspecific rate changes across the bony vertebrate lineage. Moreover, coelacanth in the study augmented our appreciation of the vertebrate cis-regulatory evolution during water–land transition. Intriguingly, we observed a pronounced upsurge in enhancer evolution in land-dwelling vertebrates. These novel findings triggered us to further investigate the evolutionary trend of coding as well as CNE nonenhancer repertoires, to highlight the relative evolutionary dynamics of diverse genomic landscapes. Surprisingly, the evolutionary rates of enhancer sequences were clearly at odds with those of the coding and the CNE nonenhancer sequences during vertebrate adaptation to land, with land vertebrates exhibiting significantly reduced rates of coding sequence evolution in comparison to their fast evolving regulatory landscape. The observed variation in tetrapod cis-regulatory elements caused the fine-tuning of associated gene regulatory networks. Therefore, the increased evolutionary rate of tetrapods’ enhancer sequences might be responsible for the variation in developmental regulatory circuits during the process of vertebrate adaptation to land. PMID:26253316

  8. The Evolution of Bony Vertebrate Enhancers at Odds with Their Coding Sequence Landscape.

    PubMed

    Yousaf, Aisha; Sohail Raza, Muhammad; Ali Abbasi, Amir

    2015-08-06

    Enhancers lie at the heart of transcriptional and developmental gene regulation. Therefore, changes in enhancer sequences usually disrupt the target gene expression and result in disease phenotypes. Despite the well-established role of enhancers in development and disease, evolutionary sequence studies are lacking. The current study attempts to unravel the puzzle of bony vertebrates' conserved noncoding elements (CNE) enhancer evolution. Bayesian phylogenetics of enhancer sequences spotlights promising interordinal relationships among placental mammals, proposing a closer relationship between humans and laurasiatherians while placing rodents at the basal position. Clock-based estimates of enhancer evolution provided a dynamic picture of interspecific rate changes across the bony vertebrate lineage. Moreover, coelacanth in the study augmented our appreciation of the vertebrate cis-regulatory evolution during water-land transition. Intriguingly, we observed a pronounced upsurge in enhancer evolution in land-dwelling vertebrates. These novel findings triggered us to further investigate the evolutionary trend of coding as well as CNE nonenhancer repertoires, to highlight the relative evolutionary dynamics of diverse genomic landscapes. Surprisingly, the evolutionary rates of enhancer sequences were clearly at odds with those of the coding and the CNE nonenhancer sequences during vertebrate adaptation to land, with land vertebrates exhibiting significantly reduced rates of coding sequence evolution in comparison to their fast evolving regulatory landscape. The observed variation in tetrapod cis-regulatory elements caused the fine-tuning of associated gene regulatory networks. Therefore, the increased evolutionary rate of tetrapods' enhancer sequences might be responsible for the variation in developmental regulatory circuits during the process of vertebrate adaptation to land.

  9. The amino-acid sequence of kangaroo pancreatic ribonuclease.

    PubMed

    Gaastra, W; Welling, G W; Beintema, J J

    1978-05-01

    Red kangaroo (Macropus rufus) ribonuclease was isolated from pancreatic tissue by affinity chromatography. The amino acid sequence was determined by automatic sequencing of overlapping large fragments and by analysis of shorter peptides obtained by digestion with a number of proteolytic enzymes. The polypeptide chain consists of 122 amino acid residues. Compared to other ribonucleases, the N-terminal residue and residue 114 are deleted. In other pancreatic ribonucleases position 114 is occupied by a cis proline residue in an external loop at the surface of the molecule. Other remarkable substitutions are the presence of a tyrosine residue at position 123 instead of a serine which forms a hydrogen bond with the pyrimidine ring of a nucleotide substrate, and a number of hydrophobichydrophilic interchanges in the sequence 51-55, which forms part of an alpha-helix in bovine ribonuclease and exhibits few substitutions in the placental mammals. Kangaroo ribonuclease contains no carbohydrate, although the enzyme possesses a recognition site for carbohydrate attachment in the sequence Asn-Val-Thr (62-64). The enzyme differs at about 35-40% of the positions from all other mammalian pancreatic ribonucleases sequenced to date, which is in agreement with the early divergence between the marsupials and the placental mammals. From fragmentary data a tentative sequence of red-necked wallaby (Macropus rufogriseus) pancreatic ribonuclease has been derived. Eight differences with the kangaroo sequence were found.

  10. Prebiotically plausible mechanisms increase compositional diversity of nucleic acid sequences

    PubMed Central

    Derr, Julien; Manapat, Michael L.; Rajamani, Sudha; Leu, Kevin; Xulvi-Brunet, Ramon; Joseph, Isaac; Nowak, Martin A.; Chen, Irene A.

    2012-01-01

    During the origin of life, the biological information of nucleic acid polymers must have increased to encode functional molecules (the RNA world). Ribozymes tend to be compositionally unbiased, as is the vast majority of possible sequence space. However, ribonucleotides vary greatly in synthetic yield, reactivity and degradation rate, and their non-enzymatic polymerization results in compositionally biased sequences. While natural selection could lead to complex sequences, molecules with some activity are required to begin this process. Was the emergence of compositionally diverse sequences a matter of chance, or could prebiotically plausible reactions counter chemical biases to increase the probability of finding a ribozyme? Our in silico simulations using a two-letter alphabet show that template-directed ligation and high concatenation rates counter compositional bias and shift the pool toward longer sequences, permitting greater exploration of sequence space and stable folding. We verified experimentally that unbiased DNA sequences are more efficient templates for ligation, thus increasing the compositional diversity of the pool. Our work suggests that prebiotically plausible chemical mechanisms of nucleic acid polymerization and ligation could predispose toward a diverse pool of longer, potentially structured molecules. Such mechanisms could have set the stage for the appearance of functional activity very early in the emergence of life. PMID:22319215

  11. The amino-acid sequence of kangaroo pancreatic ribonuclease.

    PubMed

    Gaastra, W; Welling, G W; Beintema, J J

    1978-05-01

    Red kangaroo (Macropus rufus) ribonuclease was isolated from pancreatic tissue by affinity chromatography. The amino acid sequence was determined by automatic sequencing of overlapping large fragments and by analysis of shorter peptides obtained by digestion with a number of proteolytic enzymes. The polypeptide chain consists of 122 amino acid residues. Compared to other ribonucleases, the N-terminal residue and residue 114 are deleted. In other pancreatic ribonucleases position 114 is occupied by a cis proline residue in an external loop at the surface of the molecule. Other remarkable substitutions are the presence of a tyrosine residue at position 123 instead of a serine which forms a hydrogen bond with the pyrimidine ring of a nucleotide substrate, and a number of hydrophobichydrophilic interchanges in the sequence 51-55, which forms part of an alpha-helix in bovine ribonuclease and exhibits few substitutions in the placental mammals. Kangaroo ribonuclease contains no carbohydrate, although the enzyme possesses a recognition site for carbohydrate attachment in the sequence Asn-Val-Thr (62-64). The enzyme differs at about 35-40% of the positions from all other mammalian pancreatic ribonucleases sequenced to date, which is in agreement with the early divergence between the marsupials and the placental mammals. From fragmentary data a tentative sequence of red-necked wallaby (Macropus rufogriseus) pancreatic ribonuclease has been derived. Eight differences with the kangaroo sequence were found. PMID:658039

  12. Delta sequences in the 5' non-coding region of yeast tRNA genes

    PubMed Central

    Gafner, Jürg; Robertis, Eddy M.De; Philippsen, Peter

    1983-01-01

    Two so far undetected tRNA genes were found close to delta (δ) sequences at the sup4 locus on chromosome X in the genome of Saccharomyces cerevisiae. The two genes were identified from their abundant transcription products in frog oocytes. Hybridisation experiments allowed the mapping of the transcripts in cloned DNA and DNA sequence analysis revealed the presence of one AGGtRNAArg and one GACtRNAAsp gene. tRNAAsp genes with sequences similar or identical to GACtRNAAsp exist in 14-16 copies per haploid yeast genome, whereas only one copy was detected for AGGtRNAArg. In vivo labelling of total yeast tRNA with 32P followed by hybridisation revealed that the unique AGGtRNAArg gene is transcribed in S. cerevisiae. δ sequences are present 120 bp upstream from the first coding nucleotide in the case of AGGtRNAArg, 80 bp in the case of GACtRNAAsp and 405 bp in the case of the known UACtRNATyr (sup4) gene. δ sequences, as part of Ty elements or alone, were also found by other investigators at similar distances upstream of the mRNA start in mutant alleles of protein-coding yeast genes. Although protein-coding genes are transcribed by RNA polymerase II and tRNA genes by RNA polymerase III, the 5' non-coding region of both types of genes could conceivably have a peculiar DNA or chromatin structure used as preferred landing sites by transposable elements. ImagesFig. 1.Fig. 2.Fig. 5.Fig. 6. PMID:16453444

  13. Characterization of the Suillus grevillei quinone synthetase GreA supports a nonribosomal code for aromatic α-keto acids.

    PubMed

    Wackler, Barbara; Lackner, Gerald; Chooi, Yit Heng; Hoffmeister, Dirk

    2012-08-13

    The gene greA was cloned from the genome of the basidiomycete Suillus grevillei. It encodes a monomodular natural product biosynthesis protein composed of three domains for adenylation, thiolation, and thioesterase and, hence, is reminiscent of a nonribosomal peptide synthetase (NRPS). GreA was biochemically characterized in vitro. It was identified as atromentin synthetase and therefore represents one of only a limited number of biochemically characterized NRPS-like enzymes which accept an aromatic α-keto acid. Specificity-conferring amino acid residues--collectively referred to as the nonribosomal code--were predicted for the primary sequence of the GreA adenylation domain and were an unprecedented combination for aromatic α-keto acids. Plausible support for this new code came from in silico simulation of the adenylation domain structure. According to the model, the predicted residues line the active site and, therefore, very likely contribute to substrate specificity.

  14. Widespread Differential Expression of Coding Region and 3' UTR Sequences in Neurons and Other Tissues.

    PubMed

    Kocabas, Arif; Duarte, Terence; Kumar, Saranya; Hynes, Mary A

    2015-12-16

    Mature messenger RNAs (mRNAs) consist of coding sequence (CDS) and 5' and 3' UTRs, typically expected to show similar abundance within a given neuron. Examining mRNA from defined neurons, we unexpectedly show extremely common unbalanced expression of cognate 3' UTR and CDS sequences; many genes show high 3' UTR relative to CDS, others show high CDS to 3' UTR. In situ hybridization (19 of 19 genes) shows a broad range of 3' UTR-to-CDS expression ratios across neurons and tissues. Ratios may be spatially graded or change with developmental age but are consistent across animals. Further, for two genes examined, a 3' UTR-to-CDS ratio above a particular threshold in any given neuron correlated with reduced or undetectable protein expression. Our findings raise questions about the role of isolated 3' UTR sequences in regulation of protein expression and highlight the importance of separately examining 3' UTR and CDS sequences in gene expression analyses.

  15. Cloning, sequencing, and expression of the mig gene of Mycobacterium avium, which codes for a secreted macrophage-induced protein.

    PubMed Central

    Plum, G; Brenden, M; Clark-Curtiss, J E; Pulverer, G

    1997-01-01

    Mycobacterium avium is an intracellular pathogen that has evolved to be a frequent cause of disseminated infection in immunocompromised patients. Although these bacilli are readily phagocytized, they are able to survive and even multiply within human macrophages. The process whereby mycobacteria circumvent the lytic functions of the macrophages is currently not well understood, but this is a key aspect in the pathogenicity of all pathogenic mycobacteria. Previously, we identified a gene in M. avium, designated mig (for macrophage-induced gene), the expression of which is induced when the bacilli grow in human macrophages (G. Plum and J. E. Clark-Curtiss, Infect. Immun. 62:476-483, 1994). In the present study we show that (i) the nucleotide sequence of the mig gene has an open reading frame of 295 amino acids with a strong bias for mycobacterial codon usage, (ii) the mig gene also codes for a putative signal peptide of 19 amino acid residues, (iii) mig is induced by acidity to be expressed as an early-secreted 30-kDa protein, and (iv) the Mig protein exhibits an AMP-binding domain signature. However, beyond this motif which is common to enzymes that activate a large variety of substrates, no homologies to known sequences are found. We also show that (v) Mycobacterium smegmatis strains expressing the Mig protein have a limited advantage for survival in macrophages. These findings may be concordant with a role of the mig gene in the virulence of M. avium. PMID:9353032

  16. Sequence Prediction With Sparse Distributed Hyperdimensional Coding Applied to the Analysis of Mobile Phone Use Patterns.

    PubMed

    Rasanen, Okko J; Saarinen, Jukka P

    2016-09-01

    Modeling and prediction of temporal sequences is central to many signal processing and machine learning applications. Prediction based on sequence history is typically performed using parametric models, such as fixed-order Markov chains ( n -grams), approximations of high-order Markov processes, such as mixed-order Markov models or mixtures of lagged bigram models, or with other machine learning techniques. This paper presents a method for sequence prediction based on sparse hyperdimensional coding of the sequence structure and describes how higher order temporal structures can be utilized in sparse coding in a balanced manner. The method is purely incremental, allowing real-time online learning and prediction with limited computational resources. Experiments with prediction of mobile phone use patterns, including the prediction of the next launched application, the next GPS location of the user, and the next artist played with the phone media player, reveal that the proposed method is able to capture the relevant variable-order structure from the sequences. In comparison with the n -grams and the mixed-order Markov models, the sparse hyperdimensional predictor clearly outperforms its peers in terms of unweighted average recall and achieves an equal level of weighted average recall as the mixed-order Markov chain but without the batch training of the mixed-order model.

  17. SinEx DB: a database for single exon coding sequences in mammalian genomes.

    PubMed

    Jorquera, Roddy; Ortiz, Rodrigo; Ossandon, F; Cárdenas, Juan Pablo; Sepúlveda, Rene; González, Carolina; Holmes, David S

    2016-01-01

    Eukaryotic genes are typically interrupted by intragenic, noncoding sequences termed introns. However, some genes lack introns in their coding sequence (CDS) and are generally known as 'single exon genes' (SEGs). In this work, a SEG is defined as a nuclear, protein-coding gene that lacks introns in its CDS. Whereas, many public databases of Eukaryotic multi-exon genes are available, there are only two specialized databases for SEGs. The present work addresses the need for a more extensive and diverse database by creating SinEx DB, a publicly available, searchable database of predicted SEGs from 10 completely sequenced mammalian genomes including human. SinEx DB houses the DNA and protein sequence information of these SEGs and includes their functional predictions (KOG) and the relative distribution of these functions within species. The information is stored in a relational database built with My SQL Server 5.1.33 and the complete dataset of SEG sequences and their functional predictions are available for downloading. SinEx DB can be interrogated by: (i) a browsable phylogenetic schema, (ii) carrying out BLAST searches to the in-house SinEx DB of SEGs and (iii) via an advanced search mode in which the database can be searched by key words and any combination of searches by species and predicted functions. SinEx DB provides a rich source of information for advancing our understanding of the evolution and function of SEGs.Database URL: www.sinex.cl.

  18. SinEx DB: a database for single exon coding sequences in mammalian genomes

    PubMed Central

    Jorquera, Roddy; Ortiz, Rodrigo; Ossandon, F.; Cárdenas, Juan Pablo; Sepúlveda, Rene; González, Carolina; Holmes, David S.

    2016-01-01

    Eukaryotic genes are typically interrupted by intragenic, noncoding sequences termed introns. However, some genes lack introns in their coding sequence (CDS) and are generally known as ‘single exon genes’ (SEGs). In this work, a SEG is defined as a nuclear, protein-coding gene that lacks introns in its CDS. Whereas, many public databases of Eukaryotic multi-exon genes are available, there are only two specialized databases for SEGs. The present work addresses the need for a more extensive and diverse database by creating SinEx DB, a publicly available, searchable database of predicted SEGs from 10 completely sequenced mammalian genomes including human. SinEx DB houses the DNA and protein sequence information of these SEGs and includes their functional predictions (KOG) and the relative distribution of these functions within species. The information is stored in a relational database built with My SQL Server 5.1.33 and the complete dataset of SEG sequences and their functional predictions are available for downloading. SinEx DB can be interrogated by: (i) a browsable phylogenetic schema, (ii) carrying out BLAST searches to the in-house SinEx DB of SEGs and (iii) via an advanced search mode in which the database can be searched by key words and any combination of searches by species and predicted functions. SinEx DB provides a rich source of information for advancing our understanding of the evolution and function of SEGs. Database URL: www.sinex.cl PMID:27278816

  19. SinEx DB: a database for single exon coding sequences in mammalian genomes.

    PubMed

    Jorquera, Roddy; Ortiz, Rodrigo; Ossandon, F; Cárdenas, Juan Pablo; Sepúlveda, Rene; González, Carolina; Holmes, David S

    2016-01-01

    Eukaryotic genes are typically interrupted by intragenic, noncoding sequences termed introns. However, some genes lack introns in their coding sequence (CDS) and are generally known as 'single exon genes' (SEGs). In this work, a SEG is defined as a nuclear, protein-coding gene that lacks introns in its CDS. Whereas, many public databases of Eukaryotic multi-exon genes are available, there are only two specialized databases for SEGs. The present work addresses the need for a more extensive and diverse database by creating SinEx DB, a publicly available, searchable database of predicted SEGs from 10 completely sequenced mammalian genomes including human. SinEx DB houses the DNA and protein sequence information of these SEGs and includes their functional predictions (KOG) and the relative distribution of these functions within species. The information is stored in a relational database built with My SQL Server 5.1.33 and the complete dataset of SEG sequences and their functional predictions are available for downloading. SinEx DB can be interrogated by: (i) a browsable phylogenetic schema, (ii) carrying out BLAST searches to the in-house SinEx DB of SEGs and (iii) via an advanced search mode in which the database can be searched by key words and any combination of searches by species and predicted functions. SinEx DB provides a rich source of information for advancing our understanding of the evolution and function of SEGs.Database URL: www.sinex.cl. PMID:27278816

  20. Development of an expert system for amino acid sequence identification.

    PubMed

    Hu, L; Saulinskas, E F; Johnson, P; Harrington, P B

    1996-08-01

    An expert system for amino acid sequence identification has been developed. The algorithm uses heuristic rules developed by human experts in protein sequencing. The system is applied to the chromatographic data of phenylthiohydantoin-amino acids acquired from an automated sequencer. The peak intensities in the current cycle are compared with those in the previous cycle, while the calibration and succeeding cycles are used as ancillary identification criteria when necessary. The retention time for each chromatographic peak in each cycle is corrected by the corresponding peak in the calibration cycle at the same run. The main improvement of our system compared with the onboard software used by the Applied Biosystems 477A Protein/Peptide Sequencer is that each peak in each cycle is assigned an identification name according to the corrected retention time to be used for the comparison with different cycles. The system was developed from analyses of ribonuclease A and evaluated by runs of four other protein samples that were not used in rule development. This paper demonstrates that rules developed by human experts can be automatically applied to sequence assignment. The expert system performed more accurately than the onboard software of the protein sequencer, in that the misidentification rates for the expert system were around 7%, whereas those for the onboard software were between 13 and 21%.

  1. Oxytocin receptor gene sequences in owl monkeys and other primates show remarkable interspecific regulatory and protein coding variation.

    PubMed

    Babb, Paul L; Fernandez-Duque, Eduardo; Schurr, Theodore G

    2015-10-01

    The oxytocin (OT) hormone pathway is involved in numerous physiological processes, and one of its receptor genes (OXTR) has been implicated in pair bonding behavior in mammalian lineages. This observation is important for understanding social monogamy in primates, which occurs in only a small subset of taxa, including Azara's owl monkey (Aotus azarae). To examine the potential relationship between social monogamy and OXTR variation, we sequenced its 5' regulatory (4936bp) and coding (1167bp) regions in 25 owl monkeys from the Argentinean Gran Chaco, and examined OXTR sequences from 1092 humans from the 1000 Genomes Project. We also assessed interspecific variation of OXTR in 25 primate and rodent species that represent a set of phylogenetically and behaviorally disparate taxa. Our analysis revealed substantial variation in the putative 5' regulatory region of OXTR, with marked structural differences across primate taxa, particularly for humans and chimpanzees, which exhibited unique patterns of large motifs of dinucleotide A+T repeats upstream of the OXTR 5' UTR. In addition, we observed a large number of amino acid substitutions in the OXTR CDS region among New World primate taxa that distinguish them from Old World primates. Furthermore, primate taxa traditionally defined as socially monogamous (e.g., gibbons, owl monkeys, titi monkeys, and saki monkeys) all exhibited different amino acid motifs for their respective OXTR protein coding sequences. These findings support the notion that monogamy has evolved independently in Old World and New World primates, and that it has done so through different molecular mechanisms, not exclusively through the oxytocin pathway. PMID:26025428

  2. Oxytocin receptor gene sequences in owl monkeys and other primates show remarkable interspecific regulatory and protein coding variation.

    PubMed

    Babb, Paul L; Fernandez-Duque, Eduardo; Schurr, Theodore G

    2015-10-01

    The oxytocin (OT) hormone pathway is involved in numerous physiological processes, and one of its receptor genes (OXTR) has been implicated in pair bonding behavior in mammalian lineages. This observation is important for understanding social monogamy in primates, which occurs in only a small subset of taxa, including Azara's owl monkey (Aotus azarae). To examine the potential relationship between social monogamy and OXTR variation, we sequenced its 5' regulatory (4936bp) and coding (1167bp) regions in 25 owl monkeys from the Argentinean Gran Chaco, and examined OXTR sequences from 1092 humans from the 1000 Genomes Project. We also assessed interspecific variation of OXTR in 25 primate and rodent species that represent a set of phylogenetically and behaviorally disparate taxa. Our analysis revealed substantial variation in the putative 5' regulatory region of OXTR, with marked structural differences across primate taxa, particularly for humans and chimpanzees, which exhibited unique patterns of large motifs of dinucleotide A+T repeats upstream of the OXTR 5' UTR. In addition, we observed a large number of amino acid substitutions in the OXTR CDS region among New World primate taxa that distinguish them from Old World primates. Furthermore, primate taxa traditionally defined as socially monogamous (e.g., gibbons, owl monkeys, titi monkeys, and saki monkeys) all exhibited different amino acid motifs for their respective OXTR protein coding sequences. These findings support the notion that monogamy has evolved independently in Old World and New World primates, and that it has done so through different molecular mechanisms, not exclusively through the oxytocin pathway.

  3. Radiographic image sequence coding using adaptive finite-state vector quantization

    NASA Astrophysics Data System (ADS)

    Joo, Chang-Hee; Choi, Jong S.

    1990-11-01

    Vector quantization is an effective spatial domain image coding technique at under 1 . 0 bits per pixel. To achieve the quality at lower rates it is necessary to exploit spatial redundancy over a larger region of pixels than is possible with memoryless VQ. A fmite state vector quant. izer can achieve the same performance as memoryless VQ at lower rates. This paper describes an athptive finite state vector quantization for radiographic image sequence coding. Simulation experiment has been carried out with 4*4 blocks of pixels from a sequence of cardiac angiogram consisting of 40 frames of size 256*256pixels each. At 0. 45 bpp the resulting adaptive FSVQ encoder achieves performance comparable to earlier memoryless VQs at 0. 8 bpp.

  4. Inference of Episodic Changes in Natural Selection Acting on Protein Coding Sequences via CODEML.

    PubMed

    Bielawski, Joseph P; Baker, Jennifer L; Mingrone, Joseph

    2016-01-01

    This unit provides protocols for using the CODEML program from the PAML package to make inferences about episodic natural selection in protein-coding sequences. The protocols cover inference tasks such as maximum likelihood estimation of selection intensity, testing the hypothesis of episodic positive selection, and identifying sites with a history of episodic evolution. We provide protocols for using the rich set of models implemented in CODEML to assess robustness, and for using bootstrapping to assess if the requirements for reliable statistical inference have been met. An example dataset is used to illustrate how the protocols are used with real protein-coding sequences. The workflow of this design, through automation, is readily extendable to a larger-scale evolutionary survey. © 2016 by John Wiley & Sons, Inc. PMID:27322407

  5. Affinity regression predicts the recognition code of nucleic acid binding proteins

    PubMed Central

    Pelossof, Raphael; Singh, Irtisha; Yang, Julie L.; Weirauch, Matthew T.; Hughes, Timothy R.; Leslie, Christina S.

    2016-01-01

    Predicting the affinity profiles of nucleic acid-binding proteins directly from the protein sequence is a major unsolved problem. We present a statistical approach for learning the recognition code of a family of transcription factors (TFs) or RNA-binding proteins (RBPs) from high-throughput binding assays. Our method, called affinity regression, trains on protein binding microarray (PBM) or RNA compete experiments to learn an interaction model between proteins and nucleic acids, using only protein domain and probe sequences as inputs. By training on mouse homeodomain PBM profiles, our model correctly identifies residues that confer DNA-binding specificity and accurately predicts binding motifs for an independent set of divergent homeodomains. Similarly, learning from RNA compete profiles for diverse RBPs, our model can predict the binding affinities of held-out proteins and identify key RNA-binding residues. More broadly, we envision applying our method to model and predict biological interactions in any setting where there is a high-throughput ‘affinity’ readout. PMID:26571099

  6. Using a color-coded ambigraphic nucleic acid notation to visualize conserved palindromic motifs within and across genomes

    PubMed Central

    2014-01-01

    Background Ambiscript is a graphically-designed nucleic acid notation that uses symbol symmetries to support sequence complementation, highlight biologically-relevant palindromes, and facilitate the analysis of consensus sequences. Although the original Ambiscript notation was designed to easily represent consensus sequences for multiple sequence alignments, the notation’s black-on-white ambiguity characters are unable to reflect the statistical distribution of nucleotides found at each position. We now propose a color-augmented ambigraphic notation to encode the frequency of positional polymorphisms in these consensus sequences. Results We have implemented this color-coding approach by creating an Adobe Flash® application ( http://www.ambiscript.org) that shades and colors modified Ambiscript characters according to the prevalence of the encoded nucleotide at each position in the alignment. The resulting graphic helps viewers perceive biologically-relevant patterns in multiple sequence alignments by uniquely combining color, shading, and character symmetries to highlight palindromes and inverted repeats in conserved DNA motifs. Conclusion Juxtaposing an intuitive color scheme over the deliberate character symmetries of an ambigraphic nucleic acid notation yields a highly-functional nucleic acid notation that maximizes information content and successfully embodies key principles of graphic excellence put forth by the statistician and graphic design theorist, Edward Tufte. PMID:24447494

  7. How is the serial order of a verbal sequence coded? Some comparisons between models.

    PubMed

    Hitch, Graham J; Fastame, Maria Chiara; Flude, Brenda

    2005-01-01

    Current models of verbal short-term memory (STM) propose various mechanisms for serial order. These include a gradient of activation over items, associations between items, and associations between items and their positions relative to the start or end of a sequence. We compared models using a variant of Hebb's procedure in which immediate serial recall of a sequence improves if the sequence is presented more than once. However, instead of repeating a complete sequence, we repeated different aspects of serial order information common to training lists and a subsequent test list. In Experiment 1, training lists repeated all the item-item pairings in the test list, with or without the position-item pairings in the test list. Substantial learning relative to a control condition was observed only when training lists repeated item-item pairs with position-item pairs, and position was defined relative to the start rather than end of a sequence. Experiment 2 attempted to analyse the basis of this learning effect further by repeating fragments of the test list during training, where fragments consisted of either isolated position-item pairings or clusters of both position-item and item-item pairings. Repetition of sequence fragments led to only weak learning effects. However, where learning was observed it was for specific position-item pairings. We conclude that positional cues play an important role in the coding of serial order in memory but that the information required to learn a sequence goes beyond position-item associations. We suggest that whereas STM for a novel sequence is based on positional cues, learning a sequence involves the development of some additional representation of the sequence as a whole.

  8. Rapid Quantification of Mutant Fitness in Diverse Bacteria by Sequencing Randomly Bar-Coded Transposons

    PubMed Central

    Wetmore, Kelly M.; Price, Morgan N.; Waters, Robert J.; Lamson, Jacob S.; He, Jennifer; Hoover, Cindi A.; Blow, Matthew J.; Bristow, James; Butland, Gareth

    2015-01-01

    ABSTRACT Transposon mutagenesis with next-generation sequencing (TnSeq) is a powerful approach to annotate gene function in bacteria, but existing protocols for TnSeq require laborious preparation of every sample before sequencing. Thus, the existing protocols are not amenable to the throughput necessary to identify phenotypes and functions for the majority of genes in diverse bacteria. Here, we present a method, random bar code transposon-site sequencing (RB-TnSeq), which increases the throughput of mutant fitness profiling by incorporating random DNA bar codes into Tn5 and mariner transposons and by using bar code sequencing (BarSeq) to assay mutant fitness. RB-TnSeq can be used with any transposon, and TnSeq is performed once per organism instead of once per sample. Each BarSeq assay requires only a simple PCR, and 48 to 96 samples can be sequenced on one lane of an Illumina HiSeq system. We demonstrate the reproducibility and biological significance of RB-TnSeq with Escherichia coli, Phaeobacter inhibens, Pseudomonas stutzeri, Shewanella amazonensis, and Shewanella oneidensis. To demonstrate the increased throughput of RB-TnSeq, we performed 387 successful genome-wide mutant fitness assays representing 130 different bacterium-carbon source combinations and identified 5,196 genes with significant phenotypes across the five bacteria. In P. inhibens, we used our mutant fitness data to identify genes important for the utilization of diverse carbon substrates, including a putative d-mannose isomerase that is required for mannitol catabolism. RB-TnSeq will enable the cost-effective functional annotation of diverse bacteria using mutant fitness profiling. PMID:25968644

  9. A novel DNA sequence similarity calculation based on simplified pulse-coupled neural network and Huffman coding

    NASA Astrophysics Data System (ADS)

    Jin, Xin; Nie, Rencan; Zhou, Dongming; Yao, Shaowen; Chen, Yanyan; Yu, Jiefu; Wang, Quan

    2016-11-01

    A novel method for the calculation of DNA sequence similarity is proposed based on simplified pulse-coupled neural network (S-PCNN) and Huffman coding. In this study, we propose a coding method based on Huffman coding, where the triplet code was used as a code bit to transform DNA sequence into numerical sequence. The proposed method uses the firing characters of S-PCNN neurons in DNA sequence to extract features. Besides, the proposed method can deal with different lengths of DNA sequences. First, according to the characteristics of S-PCNN and the DNA primary sequence, the latter is encoded using Huffman coding method, and then using the former, the oscillation time sequence (OTS) of the encoded DNA sequence is extracted. Simultaneously, relevant features are obtained, and finally the similarities or dissimilarities of the DNA sequences are determined by Euclidean distance. In order to verify the accuracy of this method, different data sets were used for testing. The experimental results show that the proposed method is effective.

  10. Sequence and developmental expression of mRNA coding for a gap junction protein in Xenopus

    PubMed Central

    1988-01-01

    Cloned complementary DNAs representing the complete coding sequence for an embryonic gap junction protein in the frog Xenopus laevis have been isolated and sequenced. The cDNAs hybridize with an RNA of 1.5 kb that is first detected in gastrulating embryos and accumulates throughout gastrulation and neurulation. By the tailbud stage, the highest abundance of the transcript is found in the region containing ventroposterior endoderm and the rudiment of the liver. In the adult, transcripts are present in the lungs, alimentary tract organs, and kidneys, but are not detected in the brain, heart, body wall and skeletal muscles, spleen, or ovary. The gene encoding this embryonic gap junction protein is present in only one or a few copies in the frog genome. In vitro translation of RNA synthesized from the cDNA template produces a 30-kD protein, as predicted by the coding sequence. This product has extensive sequence similarity to mammalian gap junction proteins in its putative transmembrane and extracellular domains, but has diverged substantially in two of its intracellular domains. PMID:2843548

  11. The genome of RNA tumor viruses contains polyadenylic acid sequences.

    PubMed

    Green, M; Cartas, M

    1972-04-01

    The 70S genome of two RNA tumor viruses, murine sarcoma virus and avian myeloblastosis virus, binds to Millipore filters in buffer with high salt concentration and to glass fiber filters containing poly(U). These observations suggest that 70S RNA contains adenylic acid-rich sequences. When digested by pancreatic RNase, 70S RNA of murine sarcoma virus yielded poly(A) sequences that contain 91% adenylic acid. These poly(A) sequences sedimented as a relatively homogenous peak in sucrose gradients with a sedimentation coefficient of 4-5 S, but had a mobility during polyacrylamide gel electrophoresis that corresponds to molecules that sediment at 6-7 S. If we estimate a molecular weight for each sequence of 30,000-60,000 (100-200 nucleotides) and a molecular weight for viral 70S RNA of 3-12 million, each viral genome could contain 1-8 poly(A) sequences. Possible functions of poly(A) in the infecting viral RNA may include a role in the initiation of viral DNA or RNA synthesis, in protein maturation, or in the assembly of the viral genome.

  12. Construction and Analysis of Novel 2-D Optical Orthogonal Codes Based on Extended Quadratic Congruence Codes and Modified One-Coincidence Sequence

    NASA Astrophysics Data System (ADS)

    Ji, Jianhua; Li, Wenjun; Zheng, Hongxia

    2016-06-01

    A new two-dimensional optical orthogonal code (OOC) named EQC/MOCS is constructed, using Extended Quadratic Congruence (EQC) code for time spreading and modified one-coincidence sequence (MOCS) for wavelength hopping. Compared with EQC/Prime code (PC), the number of wavelengths for EQC/MOCS is not limited to a prime number. Compared with EQC/OCS, the length of MOCS need not be expanded to the same length of EQC. EQC/MOCS can be constructed flexibly, and also has larger code cardinality.

  13. Adenovirus E1A coding sequences that enable ras and pmt oncogenes to transform cultured primary cells.

    PubMed Central

    Zerler, B; Moran, B; Maruyama, K; Moomaw, J; Grodzicker, T; Ruley, H E

    1986-01-01

    Plasmids expressing partial adenovirus early region 1A (E1A) coding sequences were tested for activities which facilitate in vitro establishment (immortalization) of primary baby rat kidney cells and which enable the T24 Harvey ras-related oncogene and the polyomavirus middle T antigen (pmt) gene to transform primary baby rat kidney cells. E1A cDNAs expressing the 289- and 243-amino acid proteins expressed both E1A transforming functions. Mutant hrA, which encodes a 140-amino acid protein derived from the amino-terminal domain shared by the 289- and 243-amino acid proteins, enabled ras (but not pmt) to transform and facilitated in vitro establishment to a low, but detectable, extent. These studies suggest that E1A functions which collaborate with ras oncogenes and those which facilitate establishment are linked. Furthermore, E1A transforming functions are not associated with activities of the 289-amino acid E1A proteins required for efficient transcriptional activation of viral early region promoters. Images PMID:3022137

  14. Sequences Of Amino Acids For Human Serum Albumin

    NASA Technical Reports Server (NTRS)

    Carter, Daniel C.

    1992-01-01

    Sequences of amino acids defined for use in making polypeptides one-third to one-sixth as large as parent human serum albumin molecule. Smaller, chemically stable peptides have diverse applications including service as artificial human serum and as active components of biosensors and chromatographic matrices. In applications involving production of artificial sera from new sequences, little or no concern about viral contaminants. Smaller genetically engineered polypeptides more easily expressed and produced in large quantities, making commercial isolation and production more feasible and profitable.

  15. Cloning and sequencing of a gene coding for an actin binding protein of Saccharomyces exiguus.

    PubMed

    Lange, U; Steiner, S; Grolig, F; Wagner, G; Philippsen, P

    1994-03-01

    The actin binding protein Abp1p of the yeast Saccharomyces cervisiae is thought to be involved in the spatial organisation of cell surface growth. It contains a potential actin binding domain and an SH-3 region, a common motif of many signal transduction proteins [1]. We have cloned and sequenced an ABP1 homologous gene of Saccharomyces exiguus, a yeast which is only distantly related to S. cerevisiae. The protein encoded by this gene is slightly larger than the respective S. cerevisiae protein (617 versus 592 amino acids). The two genes are 67.4% identical and the deduced amino acid sequences share an overall identity of 59.8%. The most conserved regions are the 148 N-terminal amino acids containing the potential actin binding site and the 58 C-terminal amino acids including the SH3 domain. In addition, both proteins contain a repeated motif of unknown function which is rich in glutamic acids with the sequence EEEEEEEAPAPSLPSR in the S. exiguus Abp1p. PMID:8110838

  16. Nucleic acid sequence design via efficient ensemble defect optimization.

    PubMed

    Zadeh, Joseph N; Wolfe, Brian R; Pierce, Niles A

    2011-02-01

    We describe an algorithm for designing the sequence of one or more interacting nucleic acid strands intended to adopt a target secondary structure at equilibrium. Sequence design is formulated as an optimization problem with the goal of reducing the ensemble defect below a user-specified stop condition. For a candidate sequence and a given target secondary structure, the ensemble defect is the average number of incorrectly paired nucleotides at equilibrium evaluated over the ensemble of unpseudoknotted secondary structures. To reduce the computational cost of accepting or rejecting mutations to a random initial sequence, candidate mutations are evaluated on the leaf nodes of a tree-decomposition of the target structure. During leaf optimization, defect-weighted mutation sampling is used to select each candidate mutation position with probability proportional to its contribution to the ensemble defect of the leaf. As subsequences are merged moving up the tree, emergent structural defects resulting from crosstalk between sibling sequences are eliminated via reoptimization within the defective subtree starting from new random subsequences. Using a Θ(N(3) ) dynamic program to evaluate the ensemble defect of a target structure with N nucleotides, this hierarchical approach implies an asymptotic optimality bound on design time: for sufficiently large N, the cost of sequence design is bounded below by 4/3 the cost of a single evaluation of the ensemble defect for the full sequence. Hence, the design algorithm has time complexity Ω(N(3) ). For target structures containing N ∈{100,200,400,800,1600,3200} nucleotides and duplex stems ranging from 1 to 30 base pairs, RNA sequence designs at 37°C typically succeed in satisfying a stop condition with ensemble defect less than N/100. Empirically, the sequence design algorithm exhibits asymptotic optimality and the exponent in the time complexity bound is sharp.

  17. Origin of a novel protein-coding gene family with similar signal sequence in Schistosoma japonicum

    PubMed Central

    2012-01-01

    Background Evolution of novel protein-coding genes is the bedrock of adaptive evolution. Recently, we identified six protein-coding genes with similar signal sequence from Schistosoma japonicum egg stage mRNA using signal sequence trap (SST). To find the mechanism underlying the origination of these genes with similar core promoter regions and signal sequence, we adopted an integrated approach utilizing whole genome, transcriptome and proteome database BLAST queries, other bioinformatics tools, and molecular analyses. Results Our data, in combination with database analyses showed evidences of expression of these genes both at the mRNA and protein levels exclusively in all developmental stages of S. japonicum. The signal sequence motif was identified in 27 distinct S. japonicum UniGene entries with multiple mRNA transcripts, and in 34 genome contigs distributed within 18 scaffolds with evidence of genome-wide dispersion. No homolog of these genes or similar domain was found in deposited data from any other organism. We observed preponderance of flanking repetitive elements (REs), albeit partial copies, especially of the RTE-like and Perere class at either side of the duplication source locus. The role of REs as major mediators of DNA-level recombination leading to dispersive duplication is discussed with evidence from our analyses. We also identified a stepwise pathway towards functional selection in evolving genes by alternative splicing. Equally, the possible transcription models of some protein-coding representatives of the duplicons are presented with evidence of expression in vitro. Conclusion Our findings contribute to the accumulating evidence of the role of REs in the generation of evolutionary novelties in organisms’ genomes. PMID:22716200

  18. Nanopores and nucleic acids: prospects for ultrarapid sequencing

    NASA Technical Reports Server (NTRS)

    Deamer, D. W.; Akeson, M.

    2000-01-01

    DNA and RNA molecules can be detected as they are driven through a nanopore by an applied electric field at rates ranging from several hundred microseconds to a few milliseconds per molecule. The nanopore can rapidly discriminate between pyrimidine and purine segments along a single-stranded nucleic acid molecule. Nanopore detection and characterization of single molecules represents a new method for directly reading information encoded in linear polymers. If single-nucleotide resolution can be achieved, it is possible that nucleic acid sequences can be determined at rates exceeding a thousand bases per second.

  19. General Strategy for the Design of DNA Coding Sequences Applied to Nanoparticle Assembly.

    PubMed

    Calais, Théo; Baijot, Vincent; Djafari Rouhani, Mehdi; Gauchard, David; Chabal, Yves J; Rossi, Carole; Estève, Alain

    2016-09-20

    The DNA-directed assembly of nano-objects has been the subject of many recent studies as a means to construct advanced nanomaterial architectures. Although much experimental in silico work has been presented and discussed, there has been no in-depth consideration of the proper design of single-strand sticky termination of DNA sequences, noted as ssST, which is important in avoiding self-folding within one DNA strand, unwanted strand-to-strand interaction, and mismatching. In this work, a new comprehensive and computationally efficient optimization algorithm is presented for the construction of all possible DNA sequences that specifically prevents these issues. This optimization procedure is also effective when a spacer section is used, typically repeated sequences of thymine or adenine placed between the ssST and the nano-object, to address the most conventional experimental protocols. We systematically discuss the fundamental statistics of DNA sequences considering complementarities limited to two (or three) adjacent pairs to avoid self-folding and hybridization of identical strands due to unwanted complements and mismatching. The optimized DNA sequences can reach maximum lengths of 9 to 34 bases depending on the level of applied constraints. The thermodynamic properties of the allowed sequences are used to develop a ranking for each design. For instance, we show that the maximum melting temperature saturates with 14 bases under typical solvation and concentration conditions. Thus, DNA ssST with optimized sequences are developed for segments ranging from 4 to 40 bases, providing a very useful guide for all technological protocols. An experimental test is presented and discussed using the aggregation of Al and CuO nanoparticles and is shown to validate and illustrate the importance of the proposed DNA coding sequence optimization. PMID:27578445

  20. Recurrent Coding Sequence Variation Explains Only A Small Fraction of the Genetic Architecture of Colorectal Cancer.

    PubMed

    Timofeeva, Maria N; Kinnersley, Ben; Farrington, Susan M; Whiffin, Nicola; Palles, Claire; Svinti, Victoria; Lloyd, Amy; Gorman, Maggie; Ooi, Li-Yin; Hosking, Fay; Barclay, Ella; Zgaga, Lina; Dobbins, Sara; Martin, Lynn; Theodoratou, Evropi; Broderick, Peter; Tenesa, Albert; Smillie, Claire; Grimes, Graeme; Hayward, Caroline; Campbell, Archie; Porteous, David; Deary, Ian J; Harris, Sarah E; Northwood, Emma L; Barrett, Jennifer H; Smith, Gillian; Wolf, Roland; Forman, David; Morreau, Hans; Ruano, Dina; Tops, Carli; Wijnen, Juul; Schrumpf, Melanie; Boot, Arnoud; Vasen, Hans F A; Hes, Frederik J; van Wezel, Tom; Franke, Andre; Lieb, Wolgang; Schafmayer, Clemens; Hampe, Jochen; Buch, Stephan; Propping, Peter; Hemminki, Kari; Försti, Asta; Westers, Helga; Hofstra, Robert; Pinheiro, Manuela; Pinto, Carla; Teixeira, Manuel; Ruiz-Ponte, Clara; Fernández-Rozadilla, Ceres; Carracedo, Angel; Castells, Antoni; Castellví-Bel, Sergi; Campbell, Harry; Bishop, D Timothy; Tomlinson, Ian P M; Dunlop, Malcolm G; Houlston, Richard S

    2015-01-01

    Whilst common genetic variation in many non-coding genomic regulatory regions are known to impart risk of colorectal cancer (CRC), much of the heritability of CRC remains unexplained. To examine the role of recurrent coding sequence variation in CRC aetiology, we genotyped 12,638 CRCs cases and 29,045 controls from six European populations. Single-variant analysis identified a coding variant (rs3184504) in SH2B3 (12q24) associated with CRC risk (OR = 1.08, P = 3.9 × 10(-7)), and novel damaging coding variants in 3 genes previously tagged by GWAS efforts; rs16888728 (8q24) in UTP23 (OR = 1.15, P = 1.4 × 10(-7)); rs6580742 and rs12303082 (12q13) in FAM186A (OR = 1.11, P = 1.2 × 10(-7) and OR = 1.09, P = 7.4 × 10(-8)); rs1129406 (12q13) in ATF1 (OR = 1.11, P = 8.3 × 10(-9)), all reaching exome-wide significance levels. Gene based tests identified associations between CRC and PCDHGA genes (P < 2.90 × 10(-6)). We found an excess of rare, damaging variants in base-excision (P = 2.4 × 10(-4)) and DNA mismatch repair genes (P = 6.1 × 10(-4)) consistent with a recessive mode of inheritance. This study comprehensively explores the contribution of coding sequence variation to CRC risk, identifying associations with coding variation in 4 genes and PCDHG gene cluster and several candidate recessive alleles. However, these findings suggest that recurrent, low-frequency coding variants account for a minority of the unexplained heritability of CRC. PMID:26553438

  1. Recurrent Coding Sequence Variation Explains Only A Small Fraction of the Genetic Architecture of Colorectal Cancer

    PubMed Central

    Timofeeva, Maria N.; Kinnersley, Ben; Farrington, Susan M.; Whiffin, Nicola; Palles, Claire; Svinti, Victoria; Lloyd, Amy; Gorman, Maggie; Ooi, Li-Yin; Hosking, Fay; Barclay, Ella; Zgaga, Lina; Dobbins, Sara; Martin, Lynn; Theodoratou, Evropi; Broderick, Peter; Tenesa, Albert; Smillie, Claire; Grimes, Graeme; Hayward, Caroline; Campbell, Archie; Porteous, David; Deary, Ian J.; Harris, Sarah E.; Northwood, Emma L.; Barrett, Jennifer H.; Smith, Gillian; Wolf, Roland; Forman, David; Morreau, Hans; Ruano, Dina; Tops, Carli; Wijnen, Juul; Schrumpf, Melanie; Boot, Arnoud; Vasen, Hans F A; Hes, Frederik J.; van Wezel, Tom; Franke, Andre; Lieb, Wolgang; Schafmayer, Clemens; Hampe, Jochen; Buch, Stephan; Propping, Peter; Hemminki, Kari; Försti, Asta; Westers, Helga; Hofstra, Robert; Pinheiro, Manuela; Pinto, Carla; Teixeira, Manuel; Ruiz-Ponte, Clara; Fernández-Rozadilla, Ceres; Carracedo, Angel; Castells, Antoni; Castellví-Bel, Sergi; Campbell, Harry; Bishop, D. Timothy; Tomlinson, Ian P M; Dunlop, Malcolm G.; Houlston, Richard S.

    2015-01-01

    Whilst common genetic variation in many non-coding genomic regulatory regions are known to impart risk of colorectal cancer (CRC), much of the heritability of CRC remains unexplained. To examine the role of recurrent coding sequence variation in CRC aetiology, we genotyped 12,638 CRCs cases and 29,045 controls from six European populations. Single-variant analysis identified a coding variant (rs3184504) in SH2B3 (12q24) associated with CRC risk (OR = 1.08, P = 3.9 × 10−7), and novel damaging coding variants in 3 genes previously tagged by GWAS efforts; rs16888728 (8q24) in UTP23 (OR = 1.15, P = 1.4 × 10−7); rs6580742 and rs12303082 (12q13) in FAM186A (OR = 1.11, P = 1.2 × 10−7 and OR = 1.09, P = 7.4 × 10−8); rs1129406 (12q13) in ATF1 (OR = 1.11, P = 8.3 × 10−9), all reaching exome-wide significance levels. Gene based tests identified associations between CRC and PCDHGA genes (P < 2.90 × 10−6). We found an excess of rare, damaging variants in base-excision (P = 2.4 × 10−4) and DNA mismatch repair genes (P = 6.1 × 10−4) consistent with a recessive mode of inheritance. This study comprehensively explores the contribution of coding sequence variation to CRC risk, identifying associations with coding variation in 4 genes and PCDHG gene cluster and several candidate recessive alleles. However, these findings suggest that recurrent, low-frequency coding variants account for a minority of the unexplained heritability of CRC. PMID:26553438

  2. The amino acid sequence of Escherichia coli cyanase.

    PubMed

    Chin, C C; Anderson, P M; Wold, F

    1983-01-10

    The amino acid sequence of the enzyme cyanase (cyanate hydrolase) from Escherichia coli has been determined by automatic Edman degradation of the intact protein and of its component peptides. The primary peptides used in the sequencing were produced by cyanogen bromide cleavage at the methionine residues, yielding 4 peptides plus free homoserine from the NH2-terminal methionine, and by trypsin cleavage at the 7 arginine residues after acetylation of the lysines. Secondary peptides required for overlaps and COOH-terminal sequences were produced by chymotrypsin or clostripain cleavage of some of the larger peptides. The complete sequence of the cyanase subunit consists of 156 amino acid residues (Mr 16,350). Based on the observation that the cysteine-containing peptide is obtained as a disulfide-linked dimer, it is proposed that the covalent structure of cyanase is made up of two subunits linked by a disulfide bond between the single cystine residue in each subunit. The native enzyme (Mr 150,000) then appears to be a complex of four or five such subunit dimers.

  3. The Complete Genome Sequence of the Lactic Acid Bacterium Lactococcus lactis ssp. lactis IL1403

    PubMed Central

    Bolotin, Alexander; Wincker, Patrick; Mauger, Stéphane; Jaillon, Olivier; Malarme, Karine; Weissenbach, Jean; Ehrlich, S. Dusko; Sorokin, Alexei

    2001-01-01

    Lactococcus lactis is a nonpathogenic AT-rich gram-positive bacterium closely related to the genus Streptococcus and is the most commonly used cheese starter. It is also the best-characterized lactic acid bacterium. We sequenced the genome of the laboratory strain IL1403, using a novel two-step strategy that comprises diagnostic sequencing of the entire genome and a shotgun polishing step. The genome contains 2,365,589 base pairs and encodes 2310 proteins, including 293 protein-coding genes belonging to six prophages and 43 insertion sequence (IS) elements. Nonrandom distribution of IS elements indicates that the chromosome of the sequenced strain may be a product of recent recombination between two closely related genomes. A complete set of late competence genes is present, indicating the ability of L. lactis to undergo DNA transformation. Genomic sequence revealed new possibilities for fermentation pathways and for aerobic respiration. It also indicated a horizontal transfer of genetic information from Lactococcus to gram-negative enteric bacteria of Salmonella-Escherichia group. [The sequence data described in this paper has been submitted to the GenBank data library under accession no. AE005176.] PMID:11337471

  4. Bar Coding Platforms for Nucleic Acid and Protein Detection

    NASA Astrophysics Data System (ADS)

    Müller, Uwe R.

    A variety of novel bar coding systems has been developed as multiplex testing platforms for applications in biological, chemical, and biomedical diagnostics. Instead of identifying a target through capture at a specific locus on an array, target analytes are captured by a bar coded tag, which then uniquely identifies the target, akin to putting a UPC bar code on a product. This requires an appropriate surface functionalization to ensure that the correct target is captured with high efficiency. Moreover the tag, or bar code, has to be readable with minimal error and at high speed, typically by flow analysis. For quantitative assays the target may be labeled separately, or the tag may also serve as the label. A great variety of materials and physicochemical principles has been exploited to generate this plethora of novel bar coding platforms. Their advantages compared to microarray-based assay platforms include in-solution binding kinetics, flexibility in assay design, compatibility with microplate-based assay automation, high sample throughput, and with some assay formats, increased sensitivity.

  5. Rare coding mutations identified by sequencing of Alzheimer disease genome‐wide association studies loci

    PubMed Central

    Vardarajan, Badri N.; Ghani, Mahdi; Kahn, Amanda; Sheikh, Stephanie; Sato, Christine; Barral, Sandra; Lee, Joseph H.; Cheng, Rong; Reitz, Christiane; Lantigua, Rafael; Reyes‐Dumeyer, Dolly; Medrano, Martin; Jimenez‐Velazquez, Ivonne Z.; Rogaeva, Ekaterina; St George‐Hyslop, Peter

    2015-01-01

    Objective To detect rare coding variants underlying loci detected by genome‐wide association studies (GWAS) of late onset Alzheimer disease (LOAD). Methods We conducted targeted sequencing of ABCA7, BIN1, CD2AP, CLU, CR1, EPHA1, MS4A4A/MS4A6A, and PICALM in 3 independent LOAD cohorts: 176 patients from 124 Caribbean Hispanics families, 120 patients and 33 unaffected individuals from the 129 National Institute on Aging LOAD Family Study; and 263 unrelated Canadian individuals of European ancestry (210 sporadic patients and 53 controls). Rare coding variants found in at least 2 data sets were genotyped in independent groups of ancestry‐matched controls. Additionally, the Exome Aggregation Consortium was used as a reference data set for population‐based allele frequencies. Results Overall we detected a statistically significant 3.1‐fold enrichment of the nonsynonymous mutations in the Caucasian LOAD cases compared with controls (p = 0.002) and no difference in synonymous variants. A stop‐gain mutation in ABCA7 (E1679X) and missense mutation in CD2AP (K633R) were highly significant in Caucasian LOAD cases, and mutations in EPHA1 (P460L) and BIN1 (K358R) were significant in Caribbean Hispanic families with LOAD. The EPHA1 variant segregated completely in an extended Caribbean Hispanic family and was also nominally significant in the Caucasians. Additionally, BIN1 (K358R) segregated in 2 of the 6 Caribbean Hispanic families where the mutations were discovered. Interpretation Targeted sequencing of confirmed GWAS loci revealed an excess burden of deleterious coding mutations in LOAD, with the greatest burden observed in ABCA7 and BIN1. Identifying coding variants in LOAD will facilitate the creation of tractable models for investigation of disease‐related mechanisms and potential therapies. Ann Neurol 2015;78:487–498 PMID:26101835

  6. Evolutionary Patterns in the Sequence and Structure of Transfer RNA: A Window into Early Translation and the Genetic Code

    PubMed Central

    Sun, Feng-Jie; Caetano-Anollés, Gustavo

    2008-01-01

    Transfer RNA (tRNA) molecules play vital roles during protein synthesis. Their acceptor arms are aminoacylated with specific amino acid residues while their anticodons delimit codon specificity. The history of these two functions has been generally linked in evolutionary studies of the genetic code. However, these functions could have been differentially recruited as evolutionary signatures were left embedded in tRNA molecules. Here we built phylogenies derived from the sequence and structure of tRNA, we forced taxa into monophyletic groups using constraint analyses, tested competing evolutionary hypotheses, and generated timelines of amino acid charging and codon discovery. Charging of Sec, Tyr, Ser and Leu appeared ancient, while specificities related to Asn, Met, and Arg were derived. The timelines also uncovered an early role of the second and then first codon bases, identified codons for Ala and Pro as the most ancient, and revealed important evolutionary take-overs related to the loss of the long variable arm in tRNA. The lack of correlation between ancestries of amino acid charging and encoding indicated that the separate discoveries of these functions reflected independent histories of recruitment. These histories were probably curbed by co-options and important take-overs during early diversification of the living world. PMID:18665254

  7. Quantum-Sequencing: Biophysics of quantum tunneling through nucleic acids

    NASA Astrophysics Data System (ADS)

    Casamada Ribot, Josep; Chatterjee, Anushree; Nagpal, Prashant

    2014-03-01

    Tunneling microscopy and spectroscopy has extensively been used in physical surface sciences to study quantum tunneling to measure electronic local density of states of nanomaterials and to characterize adsorbed species. Quantum-Sequencing (Q-Seq) is a new method based on tunneling microscopy for electronic sequencing of single molecule of nucleic acids. A major goal of third-generation sequencing technologies is to develop a fast, reliable, enzyme-free single-molecule sequencing method. Here, we present the unique ``electronic fingerprints'' for all nucleotides on DNA and RNA using Q-Seq along their intrinsic biophysical parameters. We have analyzed tunneling spectra for the nucleotides at different pH conditions and analyzed the HOMO, LUMO and energy gap for all of them. In addition we show a number of biophysical parameters to further characterize all nucleobases (electron and hole transition voltage and energy barriers). These results highlight the robustness of Q-Seq as a technique for next-generation sequencing.

  8. XY female with a dysgerminoma and no mutation in the coding sequence of the SRY gene.

    PubMed

    Morerio, Cristina; Calvari, Vladimiro; Rosanda, Cristina; Porta, Simona; Gambini, Claudio; Panarello, Claudio

    2002-07-01

    We report a 46,XY 11-year-old girl with pure gonadal dysgenesis who developed a dysgerminoma. The testis-determining gene SRY, a candidate for sex reversal, whose alterations seem to correlate with dysgerminoma, was analyzed and found to be normal; its coding sequence was negative for deletions and mutations. DMRT-1 gene mapping on 9p and DAX-1 on Xp21 were also normal. These results suggest the involvement of other genes in sex reversal and call into question the putative relationship between SRY alterations and dysgerminoma.

  9. Detection of almond allergen coding sequences in processed foods by real time PCR.

    PubMed

    Prieto, Nuria; Iniesto, Elisa; Burbano, Carmen; Cabanillas, Beatriz; Pedrosa, Mercedes M; Rovira, Mercè; Rodríguez, Julia; Muzquiz, Mercedes; Crespo, Jesus F; Cuadrado, Carmen; Linacero, Rosario

    2014-06-18

    The aim of this work was to develop and analytically validate a quantitative RT-PCR method, using novel primer sets designed on Pru du 1, Pru du 3, Pru du 4, and Pru du 6 allergen-coding sequences, and contrast the sensitivity and specificity of these probes. The temperature and/or pressure processing influence on the ability to detect these almond allergen targets was also analyzed. All primers allowed a specific and accurate amplification of these sequences. The specificity was assessed by amplifying DNA from almond, different Prunus species and other common plant food ingredients. The detection limit was 1 ppm in unprocessed almond kernels. The method's robustness and sensitivity were confirmed using spiked samples. Thermal treatment under pressure (autoclave) reduced yield and amplificability of almond DNA; however, high-hydrostatic pressure treatments did not produced such effects. Compared with ELISA assay outcomes, this RT-PCR showed higher sensitivity to detect almond traces in commercial foodstuffs. PMID:24857239

  10. Chemical interactions between amino acid and RNA: multiplicity of the levels of specificity explains origin of the genetic code

    NASA Astrophysics Data System (ADS)

    Seligmann, Hervé; Amzallag, Nissim

    2002-11-01

    The emergence of the genetic code remains an enigma. Proposed mechanisms are based on random, historical, thermodynamic and natural selection. However, they introduce chance as a key factor for overcoming the difficulties encountered by the model. We propose here a model in which three successive levels of chemical specificity generated the nucleotide assignments of amino acids in the genetic code. The first level results from hydrophobic and stereospecific interactions between amino acids and short oligonucleotides (termed oligons). The second and third levels of specificity are determined by conditions of energy transfer from loaded oligons (amino acid-oligomer covalently linked) to formation of phosphodiester bond (second level of specificity) and peptidic bond (third level of specificity), while these reactions are catalyzed by RNA templates. This model is sustained by the relationships observed between dipole moments of the nucleotides (forming the anticodon) and reactivity of the amino acyl linkage of the loaded oligon. Moreover, analysis of modern tRNAs reveals that they were probably generated by loose duplication of the nucleotide sequence forming the oligons, after emergence of the 'genetic code.' Indeed, the similarity of nucleotide composition with that of the anticodon decreases with the tRNA domain's distance from the anticodon, but the acceptor stem is relatively more similar to the anticodon than other stems closer to it. This would be because energy transfer constraints that existed between anticodon and amino acid in prebiotic loaded oligonucleotides still affect the structures of modern tRNA acceptor stems. In the model presented, the genetic code is inherent to the most archaic 'molecular physiology' in protolife, even before emergence of a functional 'protein world.' Simple physical processes, in which a level of specificity is integrated in an emerging meta-structure expressing new properties, generate a parsimonious and realistic explanation

  11. Nucleic and amino acid sequences relating to a novel transketolase, and methods for the expression thereof

    DOEpatents

    Croteau, Rodney Bruce; Wildung, Mark Raymond; Lange, Bernd Markus; McCaskill, David G.

    2001-01-01

    cDNAs encoding 1-deoxyxylulose-5-phosphate synthase from peppermint (Mentha piperita) have been isolated and sequenced, and the corresponding amino acid sequences have been determined. Accordingly, isolated DNA sequences (SEQ ID NO:3, SEQ ID NO:5, SEQ ID NO:7) are provided which code for the expression of 1-deoxyxylulose-5-phosphate synthase from plants. In another aspect the present invention provides for isolated, recombinant DXPS proteins, such as the proteins having the sequences set forth in SEQ ID NO:4, SEQ ID NO:6 and SEQ ID NO:8. In other aspects, replicable recombinant cloning vehicles are provided which code for plant 1-deoxyxylulose-5-phosphate synthases, or for a base sequence sufficiently complementary to at least a portion of 1-deoxyxylulose-5-phosphate synthase DNA or RNA to enable hybridization therewith. In yet other aspects, modified host cells are provided that have been transformed, transfected, infected and/or injected with a recombinant cloning vehicle and/or DNA sequence encoding a plant 1-deoxyxylulose-5-phosphate synthase. Thus, systems and methods are provided for the recombinant expression of the aforementioned recombinant 1-deoxyxylulose-5-phosphate synthase that may be used to facilitate its production, isolation and purification in significant amounts. Recombinant 1-deoxyxylulose-5-phosphate synthase may be used to obtain expression or enhanced expression of 1-deoxyxylulose-5-phosphate synthase in plants in order to enhance the production of 1-deoxyxylulose-5-phosphate, or its derivatives such as isopentenyl diphosphate (BP), or may be otherwise employed for the regulation or expression of 1-deoxyxylulose-5-phosphate synthase, or the production of its products.

  12. ANTICALIgN: visualizing, editing and analyzing combined nucleotide and amino acid sequence alignments for combinatorial protein engineering.

    PubMed

    Jarasch, Alexander; Kopp, Melanie; Eggenstein, Evelyn; Richter, Antonia; Gebauer, Michaela; Skerra, Arne

    2016-07-01

    ANTIC ALIGN: is an interactive software developed to simultaneously visualize, analyze and modify alignments of DNA and/or protein sequences that arise during combinatorial protein engineering, design and selection. ANTIC ALIGN: combines powerful functions known from currently available sequence analysis tools with unique features for protein engineering, in particular the possibility to display and manipulate nucleotide sequences and their translated amino acid sequences at the same time. ANTIC ALIGN: offers both template-based multiple sequence alignment (MSA), using the unmutated protein as reference, and conventional global alignment, to compare sequences that share an evolutionary relationship. The application of similarity-based clustering algorithms facilitates the identification of duplicates or of conserved sequence features among a set of selected clones. Imported nucleotide sequences from DNA sequence analysis are automatically translated into the corresponding amino acid sequences and displayed, offering numerous options for selecting reading frames, highlighting of sequence features and graphical layout of the MSA. The MSA complexity can be reduced by hiding the conserved nucleotide and/or amino acid residues, thus putting emphasis on the relevant mutated positions. ANTIC ALIGN: is also able to handle suppressed stop codons or even to incorporate non-natural amino acids into a coding sequence. We demonstrate crucial functions of ANTIC ALIGN: in an example of Anticalins selected from a lipocalin random library against the fibronectin extradomain B (ED-B), an established marker of tumor vasculature. Apart from engineered protein scaffolds, ANTIC ALIGN: provides a powerful tool in the area of antibody engineering and for directed enzyme evolution.

  13. RT-PCR amplification of the complete NF1 coding sequence

    SciTech Connect

    Ming Hong Shen; Meena Upadhyaya

    1994-09-01

    Neurofibromatosis type 1 (NF1) is a common autosomal dominant disorder. The NF1 gene is a large gene, 350kb in size, with at least 51 exons. It has proved hard to detect mutations in the gene by examining genomic DNA due to the high mutation rate and the large size of the gene. Since the cloning of the gene, only 45 causative mutations have been reported from over 500 unrelated NF1 patients screened. The coding sequence of the NF1 gene is approximately 3% of the genomic sequence; it will therefore be easier to search for unknown mutations by the study of mRNA. We describe a simple RT-PCR-based strategy to amplify the total coding sequence of the NF1 transcript from peripheral blood lymphocyte RNA. This strategy involves an initial cDNA synthesis step utilizing a set of random hexamers, followed by two consecutive rounds of PCR amplifications. The first round of amplification was performed using four NF1-specific nested primer pairs. This amplification allows the construction of overlapping fragments which span a 8694 bp cDNA sequence of the gene. For mutation analysis, the amplified products or their digests were subjected to electrophoresis on Hydrolink gels. Two disease-causing mutations, a 3 bp deletion in exon 17 and a 10 bp deletion in exon 44, originally detected in the genomic DNA from two unrelated NF1 patients, have been confirmed at the RNA level. The combination of this strategy with other established techniques such as SSCP, chemical cleavage of mismatch, protein truncation test (PTT) and quantitative PCR should greatly facilitate mutation and expression analyses in the NF1 gene.

  14. Genome Sequence Analysis of the Naphthenic Acid Degrading and Metal Resistant Bacterium Cupriavidus gilardii CR3

    PubMed Central

    Xiao, Jingfa; Hao, Lirui; Crowley, David E.; Zhang, Zhewen; Yu, Jun; Huang, Ning; Huo, Mingxin; Wu, Jiayan

    2015-01-01

    Cupriavidus sp. are generally heavy metal tolerant bacteria with the ability to degrade a variety of aromatic hydrocarbon compounds, although the degradation pathways and substrate versatilities remain largely unknown. Here we studied the bacterium Cupriavidus gilardii strain CR3, which was isolated from a natural asphalt deposit, and which was shown to utilize naphthenic acids as a sole carbon source. Genome sequencing of C. gilardii CR3 was carried out to elucidate possible mechanisms for the naphthenic acid biodegradation. The genome of C. gilardii CR3 was composed of two circular chromosomes chr1 and chr2 of respectively 3,539,530 bp and 2,039,213 bp in size. The genome for strain CR3 encoded 4,502 putative protein-coding genes, 59 tRNA genes, and many other non-coding genes. Many genes were associated with xenobiotic biodegradation and metal resistance functions. Pathway prediction for degradation of cyclohexanecarboxylic acid, a representative naphthenic acid, suggested that naphthenic acid undergoes initial ring-cleavage, after which the ring fission products can be degraded via several plausible degradation pathways including a mechanism similar to that used for fatty acid oxidation. The final metabolic products of these pathways are unstable or volatile compounds that were not toxic to CR3. Strain CR3 was also shown to have tolerance to at least 10 heavy metals, which was mainly achieved by self-detoxification through ion efflux, metal-complexation and metal-reduction, and a powerful DNA self-repair mechanism. Our genomic analysis suggests that CR3 is well adapted to survive the harsh environment in natural asphalts containing naphthenic acids and high concentrations of heavy metals. PMID:26301592

  15. Genome Sequence Analysis of the Naphthenic Acid Degrading and Metal Resistant Bacterium Cupriavidus gilardii CR3.

    PubMed

    Wang, Xiaoyu; Chen, Meili; Xiao, Jingfa; Hao, Lirui; Crowley, David E; Zhang, Zhewen; Yu, Jun; Huang, Ning; Huo, Mingxin; Wu, Jiayan

    2015-01-01

    Cupriavidus sp. are generally heavy metal tolerant bacteria with the ability to degrade a variety of aromatic hydrocarbon compounds, although the degradation pathways and substrate versatilities remain largely unknown. Here we studied the bacterium Cupriavidus gilardii strain CR3, which was isolated from a natural asphalt deposit, and which was shown to utilize naphthenic acids as a sole carbon source. Genome sequencing of C. gilardii CR3 was carried out to elucidate possible mechanisms for the naphthenic acid biodegradation. The genome of C. gilardii CR3 was composed of two circular chromosomes chr1 and chr2 of respectively 3,539,530 bp and 2,039,213 bp in size. The genome for strain CR3 encoded 4,502 putative protein-coding genes, 59 tRNA genes, and many other non-coding genes. Many genes were associated with xenobiotic biodegradation and metal resistance functions. Pathway prediction for degradation of cyclohexanecarboxylic acid, a representative naphthenic acid, suggested that naphthenic acid undergoes initial ring-cleavage, after which the ring fission products can be degraded via several plausible degradation pathways including a mechanism similar to that used for fatty acid oxidation. The final metabolic products of these pathways are unstable or volatile compounds that were not toxic to CR3. Strain CR3 was also shown to have tolerance to at least 10 heavy metals, which was mainly achieved by self-detoxification through ion efflux, metal-complexation and metal-reduction, and a powerful DNA self-repair mechanism. Our genomic analysis suggests that CR3 is well adapted to survive the harsh environment in natural asphalts containing naphthenic acids and high concentrations of heavy metals. PMID:26301592

  16. Sequence coding for the alphavirus nonstructural proteins is interrupted by an opal termination codon.

    PubMed Central

    Strauss, E G; Rice, C M; Strauss, J H

    1983-01-01

    We have obtained the nucleotide sequence of the genomic RNAs of two alphaviruses, Sindbis virus and Middelburg virus, over an extensive region encoding the nonstructural (replicase) proteins. In both viruses in an equivalent position an opal (UGA) termination codon punctuates a long otherwise open reading frame. The nonstructural proteins are translated as polyprotein precursors that are processed by posttranslational cleavage into four polypeptide chains; the sequence data presented here indicate that the COOH-terminal polypeptide, ns72, may be produced by read-through of this opal codon. The high degree of amino acid homology between the ns72 polypeptides of the two viruses, in contrast to the lack of conserved sequence upstream from the read-through site, suggests that ns72 plays an important role in viral replication, possibly modulating the action of other replicase components. PMID:6577423

  17. Whole-Exome Sequencing Identifies Rare and Low-Frequency Coding Variants Associated with LDL Cholesterol

    PubMed Central

    Lange, Leslie A.; Hu, Youna; Zhang, He; Xue, Chenyi; Schmidt, Ellen M.; Tang, Zheng-Zheng; Bizon, Chris; Lange, Ethan M.; Smith, Joshua D.; Turner, Emily H.; Jun, Goo; Kang, Hyun Min; Peloso, Gina; Auer, Paul; Li, Kuo-ping; Flannick, Jason; Zhang, Ji; Fuchsberger, Christian; Gaulton, Kyle; Lindgren, Cecilia; Locke, Adam; Manning, Alisa; Sim, Xueling; Rivas, Manuel A.; Holmen, Oddgeir L.; Gottesman, Omri; Lu, Yingchang; Ruderfer, Douglas; Stahl, Eli A.; Duan, Qing; Li, Yun; Durda, Peter; Jiao, Shuo; Isaacs, Aaron; Hofman, Albert; Bis, Joshua C.; Correa, Adolfo; Griswold, Michael E.; Jakobsdottir, Johanna; Smith, Albert V.; Schreiner, Pamela J.; Feitosa, Mary F.; Zhang, Qunyuan; Huffman, Jennifer E.; Crosby, Jacy; Wassel, Christina L.; Do, Ron; Franceschini, Nora; Martin, Lisa W.; Robinson, Jennifer G.; Assimes, Themistocles L.; Crosslin, David R.; Rosenthal, Elisabeth A.; Tsai, Michael; Rieder, Mark J.; Farlow, Deborah N.; Folsom, Aaron R.; Lumley, Thomas; Fox, Ervin R.; Carlson, Christopher S.; Peters, Ulrike; Jackson, Rebecca D.; van Duijn, Cornelia M.; Uitterlinden, André G.; Levy, Daniel; Rotter, Jerome I.; Taylor, Herman A.; Gudnason, Vilmundur; Siscovick, David S.; Fornage, Myriam; Borecki, Ingrid B.; Hayward, Caroline; Rudan, Igor; Chen, Y. Eugene; Bottinger, Erwin P.; Loos, Ruth J.F.; Sætrom, Pål; Hveem, Kristian; Boehnke, Michael; Groop, Leif; McCarthy, Mark; Meitinger, Thomas; Ballantyne, Christie M.; Gabriel, Stacey B.; O’Donnell, Christopher J.; Post, Wendy S.; North, Kari E.; Reiner, Alexander P.; Boerwinkle, Eric; Psaty, Bruce M.; Altshuler, David; Kathiresan, Sekar; Lin, Dan-Yu; Jarvik, Gail P.; Cupples, L. Adrienne; Kooperberg, Charles; Wilson, James G.; Nickerson, Deborah A.; Abecasis, Goncalo R.; Rich, Stephen S.; Tracy, Russell P.; Willer, Cristen J.; Gabriel, Stacey B.; Altshuler, David M.; Abecasis, Gonçalo R.; Allayee, Hooman; Cresci, Sharon; Daly, Mark J.; de Bakker, Paul I.W.; DePristo, Mark A.; Do, Ron; Donnelly, Peter; Farlow, Deborah N.; Fennell, Tim; Garimella, Kiran; Hazen, Stanley L.; Hu, Youna; Jordan, Daniel M.; Jun, Goo; Kathiresan, Sekar; Kang, Hyun Min; Kiezun, Adam; Lettre, Guillaume; Li, Bingshan; Li, Mingyao; Newton-Cheh, Christopher H.; Padmanabhan, Sandosh; Peloso, Gina; Pulit, Sara; Rader, Daniel J.; Reich, David; Reilly, Muredach P.; Rivas, Manuel A.; Schwartz, Steve; Scott, Laura; Siscovick, David S.; Spertus, John A.; Stitziel, Nathaniel O.; Stoletzki, Nina; Sunyaev, Shamil R.; Voight, Benjamin F.; Willer, Cristen J.; Rich, Stephen S.; Akylbekova, Ermeg; Atwood, Larry D.; Ballantyne, Christie M.; Barbalic, Maja; Barr, R. Graham; Benjamin, Emelia J.; Bis, Joshua; Boerwinkle, Eric; Bowden, Donald W.; Brody, Jennifer; Budoff, Matthew; Burke, Greg; Buxbaum, Sarah; Carr, Jeff; Chen, Donna T.; Chen, Ida Y.; Chen, Wei-Min; Concannon, Pat; Crosby, Jacy; Cupples, L. Adrienne; D’Agostino, Ralph; DeStefano, Anita L.; Dreisbach, Albert; Dupuis, Josée; Durda, J. Peter; Ellis, Jaclyn; Folsom, Aaron R.; Fornage, Myriam; Fox, Caroline S.; Fox, Ervin; Funari, Vincent; Ganesh, Santhi K.; Gardin, Julius; Goff, David; Gordon, Ora; Grody, Wayne; Gross, Myron; Guo, Xiuqing; Hall, Ira M.; Heard-Costa, Nancy L.; Heckbert, Susan R.; Heintz, Nicholas; Herrington, David M.; Hickson, DeMarc; Huang, Jie; Hwang, Shih-Jen; Jacobs, David R.; Jenny, Nancy S.; Johnson, Andrew D.; Johnson, Craig W.; Kawut, Steven; Kronmal, Richard; Kurz, Raluca; Lange, Ethan M.; Lange, Leslie A.; Larson, Martin G.; Lawson, Mark; Lewis, Cora E.; Levy, Daniel; Li, Dalin; Lin, Honghuang; Liu, Chunyu; Liu, Jiankang; Liu, Kiang; Liu, Xiaoming; Liu, Yongmei; Longstreth, William T.; Loria, Cay; Lumley, Thomas; Lunetta, Kathryn; Mackey, Aaron J.; Mackey, Rachel; Manichaikul, Ani; Maxwell, Taylor; McKnight, Barbara; Meigs, James B.; Morrison, Alanna C.; Musani, Solomon K.; Mychaleckyj, Josyf C.; Nettleton, Jennifer A.; North, Kari; O’Donnell, Christopher J.; O’Leary, Daniel; Ong, Frank; Palmas, Walter; Pankow, James S.; Pankratz, Nathan D.; Paul, Shom; Perez, Marco; Person, Sharina D.; Polak, Joseph; Post, Wendy S.; Psaty, Bruce M.; Quinlan, Aaron R.; Raffel, Leslie J.; Ramachandran, Vasan S.; Reiner, Alexander P.; Rice, Kenneth; Rotter, Jerome I.; Sanders, Jill P.; Schreiner, Pamela; Seshadri, Sudha; Shea, Steve; Sidney, Stephen; Silverstein, Kevin; Smith, Nicholas L.; Sotoodehnia, Nona; Srinivasan, Asoke; Taylor, Herman A.; Taylor, Kent; Thomas, Fridtjof; Tracy, Russell P.; Tsai, Michael Y.; Volcik, Kelly A.; Wassel, Chrstina L.; Watson, Karol; Wei, Gina; White, Wendy; Wiggins, Kerri L.; Wilk, Jemma B.; Williams, O. Dale; Wilson, Gregory; Wilson, James G.; Wolf, Phillip; Zakai, Neil A.; Hardy, John; Meschia, James F.; Nalls, Michael; Singleton, Andrew; Worrall, Brad; Bamshad, Michael J.; Barnes, Kathleen C.; Abdulhamid, Ibrahim; Accurso, Frank; Anbar, Ran; Beaty, Terri; Bigham, Abigail; Black, Phillip; Bleecker, Eugene; Buckingham, Kati; Cairns, Anne Marie; Caplan, Daniel; Chatfield, Barbara; Chidekel, Aaron; Cho, Michael; Christiani, David C.; Crapo, James D.; Crouch, Julia; Daley, Denise; Dang, Anthony; Dang, Hong; De Paula, Alicia; DeCelie-Germana, Joan; Drumm, Allen DozorMitch; Dyson, Maynard; Emerson, Julia; Emond, Mary J.; Ferkol, Thomas; Fink, Robert; Foster, Cassandra; Froh, Deborah; Gao, Li; Gershan, William; Gibson, Ronald L.; Godwin, Elizabeth; Gondor, Magdalen; Gutierrez, Hector; Hansel, Nadia N.; Hassoun, Paul M.; Hiatt, Peter; Hokanson, John E.; Howenstine, Michelle; Hummer, Laura K.; Kanga, Jamshed; Kim, Yoonhee; Knowles, Michael R.; Konstan, Michael; Lahiri, Thomas; Laird, Nan; Lange, Christoph; Lin, Lin; Lin, Xihong; Louie, Tin L.; Lynch, David; Make, Barry; Martin, Thomas R.; Mathai, Steve C.; Mathias, Rasika A.; McNamara, John; McNamara, Sharon; Meyers, Deborah; Millard, Susan; Mogayzel, Peter; Moss, Richard; Murray, Tanda; Nielson, Dennis; Noyes, Blakeslee; O’Neal, Wanda; Orenstein, David; O’Sullivan, Brian; Pace, Rhonda; Pare, Peter; Parker, H. Worth; Passero, Mary Ann; Perkett, Elizabeth; Prestridge, Adrienne; Rafaels, Nicholas M.; Ramsey, Bonnie; Regan, Elizabeth; Ren, Clement; Retsch-Bogart, George; Rock, Michael; Rosen, Antony; Rosenfeld, Margaret; Ruczinski, Ingo; Sanford, Andrew; Schaeffer, David; Sell, Cindy; Sheehan, Daniel; Silverman, Edwin K.; Sin, Don; Spencer, Terry; Stonebraker, Jackie; Tabor, Holly K.; Varlotta, Laurie; Vergara, Candelaria I.; Weiss, Robert; Wigley, Fred; Wise, Robert A.; Wright, Fred A.; Wurfel, Mark M.; Zanni, Robert; Zou, Fei; Nickerson, Deborah A.; Rieder, Mark J.; Green, Phil; Shendure, Jay; Akey, Joshua M.; Bustamante, Carlos D.; Crosslin, David R.; Eichler, Evan E.; Fox, P. Keolu; Fu, Wenqing; Gordon, Adam; Gravel, Simon; Jarvik, Gail P.; Johnsen, Jill M.; Kan, Mengyuan; Kenny, Eimear E.; Kidd, Jeffrey M.; Lara-Garduno, Fremiet; Leal, Suzanne M.; Liu, Dajiang J.; McGee, Sean; O’Connor, Timothy D.; Paeper, Bryan; Robertson, Peggy D.; Smith, Joshua D.; Staples, Jeffrey C.; Tennessen, Jacob A.; Turner, Emily H.; Wang, Gao; Yi, Qian; Jackson, Rebecca; Peters, Ulrike; Carlson, Christopher S.; Anderson, Garnet; Anton-Culver, Hoda; Assimes, Themistocles L.; Auer, Paul L.; Beresford, Shirley; Bizon, Chris; Black, Henry; Brunner, Robert; Brzyski, Robert; Burwen, Dale; Caan, Bette; Carty, Cara L.; Chlebowski, Rowan; Cummings, Steven; Curb, J. David; Eaton, Charles B.; Ford, Leslie; Franceschini, Nora; Fullerton, Stephanie M.; Gass, Margery; Geller, Nancy; Heiss, Gerardo; Howard, Barbara V.; Hsu, Li; Hutter, Carolyn M.; Ioannidis, John; Jiao, Shuo; Johnson, Karen C.; Kooperberg, Charles; Kuller, Lewis; LaCroix, Andrea; Lakshminarayan, Kamakshi; Lane, Dorothy; Lasser, Norman; LeBlanc, Erin; Li, Kuo-Ping; Limacher, Marian; Lin, Dan-Yu; Logsdon, Benjamin A.; Ludlam, Shari; Manson, JoAnn E.; Margolis, Karen; Martin, Lisa; McGowan, Joan; Monda, Keri L.; Kotchen, Jane Morley; Nathan, Lauren; Ockene, Judith; O’Sullivan, Mary Jo; Phillips, Lawrence S.; Prentice, Ross L.; Robbins, John; Robinson, Jennifer G.; Rossouw, Jacques E.; Sangi-Haghpeykar, Haleh; Sarto, Gloria E.; Shumaker, Sally; Simon, Michael S.; Stefanick, Marcia L.; Stein, Evan; Tang, Hua; Taylor, Kira C.; Thomson, Cynthia A.; Thornton, Timothy A.; Van Horn, Linda; Vitolins, Mara; Wactawski-Wende, Jean; Wallace, Robert; Wassertheil-Smoller, Sylvia; Zeng, Donglin; Applebaum-Bowden, Deborah; Feolo, Michael; Gan, Weiniu; Paltoo, Dina N.; Sholinsky, Phyliss; Sturcke, Anne

    2014-01-01

    Elevated low-density lipoprotein cholesterol (LDL-C) is a treatable, heritable risk factor for cardiovascular disease. Genome-wide association studies (GWASs) have identified 157 variants associated with lipid levels but are not well suited to assess the impact of rare and low-frequency variants. To determine whether rare or low-frequency coding variants are associated with LDL-C, we exome sequenced 2,005 individuals, including 554 individuals selected for extreme LDL-C (>98th or <2nd percentile). Follow-up analyses included sequencing of 1,302 additional individuals and genotype-based analysis of 52,221 individuals. We observed significant evidence of association between LDL-C and the burden of rare or low-frequency variants in PNPLA5, encoding a phospholipase-domain-containing protein, and both known and previously unidentified variants in PCSK9, LDLR and APOB, three known lipid-related genes. The effect sizes for the burden of rare variants for each associated gene were substantially higher than those observed for individual SNPs identified from GWASs. We replicated the PNPLA5 signal in an independent large-scale sequencing study of 2,084 individuals. In conclusion, this large whole-exome-sequencing study for LDL-C identified a gene not known to be implicated in LDL-C and provides unique insight into the design and analysis of similar experiments. PMID:24507775

  18. Multiplex iterative plasmid engineering for combinatorial optimization of metabolic pathways and diversification of protein coding sequences.

    PubMed

    Li, Yifan; Gu, Qun; Lin, Zhenquan; Wang, Zhiwen; Chen, Tao; Zhao, Xueming

    2013-11-15

    Engineering complex biological systems typically requires combinatorial optimization to achieve the desired functionality. Here, we present Multiplex Iterative Plasmid Engineering (MIPE), which is a highly efficient and customized method for combinatorial diversification of plasmid sequences. MIPE exploits ssDNA mediated λ Red recombineering for the introduction of mutations, allowing it to target several sites simultaneously and generate libraries of up to 10(7) sequences in one reaction. We also describe "restriction digestion mediated co-selection (RD CoS)", which enables MIPE to produce enhanced recombineering efficiencies with greatly simplified coselection procedures. To demonstrate this approach, we applied MIPE to fine-tune gene expression level in the 5-gene riboflavin biosynthetic pathway and successfully isolated a clone with 2.67-fold improved production in less than a week. We further demonstrated the ability of MIPE for highly multiplexed diversification of protein coding sequence by simultaneously targeting 23 codons scattered along the 750 bp sequence. We anticipate this method to benefit the optimization of diverse biological systems in synthetic biology and metabolic engineering.

  19. Whole-exome sequencing identifies rare and low-frequency coding variants associated with LDL cholesterol.

    PubMed

    Lange, Leslie A; Hu, Youna; Zhang, He; Xue, Chenyi; Schmidt, Ellen M; Tang, Zheng-Zheng; Bizon, Chris; Lange, Ethan M; Smith, Joshua D; Turner, Emily H; Jun, Goo; Kang, Hyun Min; Peloso, Gina; Auer, Paul; Li, Kuo-Ping; Flannick, Jason; Zhang, Ji; Fuchsberger, Christian; Gaulton, Kyle; Lindgren, Cecilia; Locke, Adam; Manning, Alisa; Sim, Xueling; Rivas, Manuel A; Holmen, Oddgeir L; Gottesman, Omri; Lu, Yingchang; Ruderfer, Douglas; Stahl, Eli A; Duan, Qing; Li, Yun; Durda, Peter; Jiao, Shuo; Isaacs, Aaron; Hofman, Albert; Bis, Joshua C; Correa, Adolfo; Griswold, Michael E; Jakobsdottir, Johanna; Smith, Albert V; Schreiner, Pamela J; Feitosa, Mary F; Zhang, Qunyuan; Huffman, Jennifer E; Crosby, Jacy; Wassel, Christina L; Do, Ron; Franceschini, Nora; Martin, Lisa W; Robinson, Jennifer G; Assimes, Themistocles L; Crosslin, David R; Rosenthal, Elisabeth A; Tsai, Michael; Rieder, Mark J; Farlow, Deborah N; Folsom, Aaron R; Lumley, Thomas; Fox, Ervin R; Carlson, Christopher S; Peters, Ulrike; Jackson, Rebecca D; van Duijn, Cornelia M; Uitterlinden, André G; Levy, Daniel; Rotter, Jerome I; Taylor, Herman A; Gudnason, Vilmundur; Siscovick, David S; Fornage, Myriam; Borecki, Ingrid B; Hayward, Caroline; Rudan, Igor; Chen, Y Eugene; Bottinger, Erwin P; Loos, Ruth J F; Sætrom, Pål; Hveem, Kristian; Boehnke, Michael; Groop, Leif; McCarthy, Mark; Meitinger, Thomas; Ballantyne, Christie M; Gabriel, Stacey B; O'Donnell, Christopher J; Post, Wendy S; North, Kari E; Reiner, Alexander P; Boerwinkle, Eric; Psaty, Bruce M; Altshuler, David; Kathiresan, Sekar; Lin, Dan-Yu; Jarvik, Gail P; Cupples, L Adrienne; Kooperberg, Charles; Wilson, James G; Nickerson, Deborah A; Abecasis, Goncalo R; Rich, Stephen S; Tracy, Russell P; Willer, Cristen J

    2014-02-01

    Elevated low-density lipoprotein cholesterol (LDL-C) is a treatable, heritable risk factor for cardiovascular disease. Genome-wide association studies (GWASs) have identified 157 variants associated with lipid levels but are not well suited to assess the impact of rare and low-frequency variants. To determine whether rare or low-frequency coding variants are associated with LDL-C, we exome sequenced 2,005 individuals, including 554 individuals selected for extreme LDL-C (>98(th) or <2(nd) percentile). Follow-up analyses included sequencing of 1,302 additional individuals and genotype-based analysis of 52,221 individuals. We observed significant evidence of association between LDL-C and the burden of rare or low-frequency variants in PNPLA5, encoding a phospholipase-domain-containing protein, and both known and previously unidentified variants in PCSK9, LDLR and APOB, three known lipid-related genes. The effect sizes for the burden of rare variants for each associated gene were substantially higher than those observed for individual SNPs identified from GWASs. We replicated the PNPLA5 signal in an independent large-scale sequencing study of 2,084 individuals. In conclusion, this large whole-exome-sequencing study for LDL-C identified a gene not known to be implicated in LDL-C and provides unique insight into the design and analysis of similar experiments.

  20. Nucleic acid sequence detection using multiplexed oligonucleotide PCR

    SciTech Connect

    Nolan, John P.; White, P. Scott

    2006-12-26

    Methods for rapidly detecting single or multiple sequence alleles in a sample nucleic acid are described. Provided are all of the oligonucleotide pairs capable of annealing specifically to a target allele and discriminating among possible sequences thereof, and ligating to each other to form an oligonucleotide complex when a particular sequence feature is present (or, alternatively, absent) in the sample nucleic acid. The design of each oligonucleotide pair permits the subsequent high-level PCR amplification of a specific amplicon when the oligonucleotide complex is formed, but not when the oligonucleotide complex is not formed. The presence or absence of the specific amplicon is used to detect the allele. Detection of the specific amplicon may be achieved using a variety of methods well known in the art, including without limitation, oligonucleotide capture onto DNA chips or microarrays, oligonucleotide capture onto beads or microspheres, electrophoresis, and mass spectrometry. Various labels and address-capture tags may be employed in the amplicon detection step of multiplexed assays, as further described herein.

  1. Evolutionary and sequence-based relationships in bacterial AdoMet-dependent non-coding RNA methyltransferases

    PubMed Central

    2014-01-01

    Background RNA post-transcriptional modification is an exciting field of research that has evidenced this editing process as a sophisticated epigenetic mechanism to fine tune the ribosome function and to control gene expression. Although tRNA modifications seem to be more relevant for the ribosome function and cell physiology as a whole, some rRNA modifications have also been seen to play pivotal roles, essentially those located in central ribosome regions. RNA methylation at nucleobases and ribose moieties of nucleotides appear to frequently modulate its chemistry and structure. RNA methyltransferases comprise a superfamily of highly specialized enzymes that accomplish a wide variety of modifications. These enzymes exhibit a poor degree of sequence similarity in spite of using a common reaction cofactor and modifying the same substrate type. Results Relationships and lineages of RNA methyltransferases have been extensively discussed, but no consensus has been reached. To shed light on this topic, we performed amino acid and codon-based sequence analyses to determine phylogenetic relationships and molecular evolution. We found that most Class I RNA MTases are evolutionarily related to protein and cofactor/vitamin biosynthesis methyltransferases. Additionally, we found that at least nine lineages explain the diversity of RNA MTases. We evidenced that RNA methyltransferases have high content of polar and positively charged amino acid, which coincides with the electrochemistry of their substrates. Conclusions After studying almost 12,000 bacterial genomes and 2,000 patho-pangenomes, we revealed that molecular evolution of Class I methyltransferases matches the different rates of synonymous and non-synonymous substitutions along the coding region. Consequently, evolution on Class I methyltransferases selects against amino acid changes affecting the structure conformation. PMID:25012753

  2. Characterization and amino acid sequence of a fatty acid-binding protein from human heart.

    PubMed Central

    Offner, G D; Brecher, P; Sawlivich, W B; Costello, C E; Troxler, R F

    1988-01-01

    The complete amino acid sequence of a fatty acid-binding protein from human heart was determined by automated Edman degradation of CNBr, BNPS-skatole [3'-bromo-3-methyl-2-(2-nitrobenzenesulphenyl)indolenine], hydroxylamine, Staphylococcus aureus V8 proteinase, tryptic and chymotryptic peptides, and by digestion of the protein with carboxypeptidase A. The sequence of the blocked N-terminal tryptic peptide from citraconylated protein was determined by collisionally induced decomposition mass spectrometry. The protein contains 132 amino acid residues, is enriched with respect to threonine and lysine, lacks cysteine, has an acetylated valine residue at the N-terminus, and has an Mr of 14768 and an isoelectric point of 5.25. This protein contains two short internal repeated sequences from residues 48-54 and from residues 114-119 located within regions of predicted beta-structure and decreasing hydrophobicity. These short repeats are contained within two longer repeated regions from residues 48-60 and residues 114-125, which display 62% sequence similarity. These regions could accommodate the charged and uncharged moieties of long-chain fatty acids and may represent fatty acid-binding domains consistent with the finding that human heart fatty acid-binding protein binds 2 mol of oleate or palmitate/mol of protein. Detailed evidence for the amino acid sequences of the peptides has been deposited as Supplementary Publication SUP 50143 (23 pages) at the British Library Lending Division, Boston Spa, Yorkshire LS23 7BQ, U.K., from whom copies may be obtained as indicated in Biochem. J. (1988) 249, 5. PMID:3421901

  3. The sequence of rat leukosialin (W3/13 antigen) reveals a molecule with O-linked glycosylation of one third of its extracellular amino acids.

    PubMed Central

    Killeen, N; Barclay, A N; Willis, A C; Williams, A F

    1987-01-01

    Leukosialin is one of the major glycoproteins of thymocytes and T lymphocytes and is notable for a very high content of O-linked carbohydrate structures. The full protein sequence for rat leukosialin as translated from cDNA clones is now reported. The molecule contains 371 amino acids with 224 residues outside the cell, one transmembrane sequence and 124 cytoplasmic residues. Data from the peptide sequence and carbohydrate composition suggest that one in three of the extracellular amino acids may be O-glycosylated with no N-linked glycosylation sites. The cDNA sequence contained a CpG rich region in the 3' coding sequence and a large 3' non-coding region which included tandem repeats of the sequence GGAT. Images Fig. 4. PMID:2965006

  4. Gamma Peptide Nucleic Acids: As Orthogonal Nucleic Acid Recognition Codes for Organizing Molecular Self-Assembly.

    PubMed

    Sacui, Iulia; Hsieh, Wei-Che; Manna, Arunava; Sahu, Bichismita; Ly, Danith H

    2015-07-01

    Nucleic acids are an attractive platform for organizing molecular self-assembly because of their specific nucleobase interactions and defined length scale. Routinely employed in the organization and assembly of materials in vitro, however, they have rarely been exploited in vivo, due to the concerns for enzymatic degradation and cross-hybridization with the host's genetic materials. Herein we report the development of a tight-binding, orthogonal, synthetically versatile, and informationally interfaced nucleic acid platform for programming molecular interactions, with implications for in vivo molecular assembly and computing. The system consists of three molecular entities: the right-handed and left-handed conformers and a nonhelical domain. The first two are orthogonal to each other in recognition, while the third is capable of binding to both, providing a means for interfacing the two conformers as well as the natural nucleic acid biopolymers (i.e., DNA and RNA). The three molecular entities are prepared from the same monomeric chemical scaffold, with the exception of the stereochemistry or lack thereof at the γ-backbone that determines if the corresponding oligo adopts a right-handed or left-handed helix, or a nonhelical motif. These conformers hybridize to each other with exquisite affinity, sequence selectivity, and level of orthogonality. Recognition modules as short as five nucleotides in length are capable of organizing molecular assembly.

  5. A probabilistic coding based quantum genetic algorithm for multiple sequence alignment.

    PubMed

    Huo, Hongwei; Xie, Qiaoluan; Shen, Xubang; Stojkovic, Vojislav

    2008-01-01

    This paper presents an original Quantum Genetic algorithm for Multiple sequence ALIGNment (QGMALIGN) that combines a genetic algorithm and a quantum algorithm. A quantum probabilistic coding is designed for representing the multiple sequence alignment. A quantum rotation gate as a mutation operator is used to guide the quantum state evolution. Six genetic operators are designed on the coding basis to improve the solution during the evolutionary process. The features of implicit parallelism and state superposition in quantum mechanics and the global search capability of the genetic algorithm are exploited to get efficient computation. A set of well known test cases from BAliBASE2.0 is used as reference to evaluate the efficiency of the QGMALIGN optimization. The QGMALIGN results have been compared with the most popular methods (CLUSTALX, SAGA, DIALIGN, SB_PIMA, and QGMALIGN) results. The QGMALIGN results show that QGMALIGN performs well on the presenting biological data. The addition of genetic operators to the quantum algorithm lowers the cost of overall running time.

  6. Characterization of genomic sequence coding for bromelain inhibitors in pineapple and expression of its recombinant isoform.

    PubMed

    Sawano, Yoriko; Muramatsu, Tomonari; Hatano, Ken-ichi; Nagata, Koji; Tanokura, Masaru

    2002-08-01

    Bromelain inhibitor (BI) is a cysteine proteinase inhibitor isolated from pineapple stem (Reddy, M. N., Keim, P. S., Heinrikson, R. L., and Kézdy, F. J. (1975) J. Biol. Chem. 250, 1741-1750). It consists of eight isoinhibitors, and each isoinhibitor has a two-chain structure. In this study, the genomic DNA has been cloned and found to encode a precursor protein with 246 amino acids (M(r) = approximately 27,500) containing three isoinhibitor domains (BI-III, -VI, and -VII) that are 93% identical to one another in amino acid sequences. The gene structure indicated that these isoinhibitors are produced by removal of the N-terminal pre-peptide (19 residues), 3 interchain peptides (each 5 residues), 2 interdomain peptides (each 19 residues), and the C-terminal pro-peptide (18 residues). Moreover, all the amino acid sequences of bromelain isoinhibitors could be explained by removal of one or two amino acids from BI-III, -VI, and -VII with exopeptidases. A recombinant single-chain BI-VI with and without the interchain peptide showed the same and no bromelain inhibitory activity as compared with the native BI-VI, respectively. These results indicate that the interchain peptide plays an important role of the folding process of the mature isoinhibitors. PMID:12016215

  7. Fatty Acid Profile and Unigene-Derived Simple Sequence Repeat Markers in Tung Tree (Vernicia fordii)

    PubMed Central

    Zhang, Lin; Jia, Baoguang; Tan, Xiaofeng; Thammina, Chandra S.; Long, Hongxu; Liu, Min; Wen, Shanna; Song, Xianliang; Cao, Heping

    2014-01-01

    Tung tree (Vernicia fordii) provides the sole source of tung oil widely used in industry. Lack of fatty acid composition and molecular markers hinders biochemical, genetic and breeding research. The objectives of this study were to determine fatty acid profiles and develop unigene-derived simple sequence repeat (SSR) markers in tung tree. Fatty acid profiles of 41 accessions showed that the ratio of α-eleostearic acid was increasing continuously with a parallel trend to the amount of tung oil accumulation while the ratios of other fatty acids were decreasing in different stages of the seeds and that α-eleostearic acid (18∶3) consisted of 77% of the total fatty acids in tung oil. Transcriptome sequencing identified 81,805 unigenes from tung cDNA library constructed using seed mRNA and discovered 6,366 SSRs in 5,404 unigenes. The di- and tri-nucleotide microsatellites accounted for 92% of the SSRs with AG/CT and AAG/CTT being the most abundant SSR motifs. Fifteen polymorphic genic-SSR markers were developed from 98 unigene loci tested in 41 cultivated tung accessions by agarose gel and capillary electrophoresis. Genbank database search identified 10 of them putatively coding for functional proteins. Quantitative PCR demonstrated that all 15 polymorphic SSR-associated unigenes were expressed in tung seeds and some of them were highly correlated with oil composition in the seeds. Dendrogram revealed that most of the 41 accessions were clustered according to the geographic region. These new polymorphic genic-SSR markers will facilitate future studies on genetic diversity, molecular fingerprinting, comparative genomics and genetic mapping in tung tree. The lipid profiles in the seeds of 41 tung accessions will be valuable for biochemical and breeding studies. PMID:25167054

  8. Major Breeding Plumage Color Differences of Male Ruffs (Philomachus pugnax) Are Not Associated With Coding Sequence Variation in the MC1R Gene

    PubMed Central

    Küpper, Clemens; Burke, Terry; Lank, David B.

    2015-01-01

    Sequence variation in the melanocortin-1 receptor (MC1R) gene explains color morph variation in several species of birds and mammals. Ruffs (Philomachus pugnax) exhibit major dark/light color differences in melanin-based male breeding plumage which is closely associated with alternative reproductive behavior. A previous study identified a microsatellite marker (Ppu020) near the MC1R locus associated with the presence/absence of ornamental plumage. We investigated whether coding sequence variation in the MC1R gene explains major dark/light plumage color variation and/or the presence/absence of ornamental plumage in ruffs. Among 821bp of the MC1R coding region from 44 male ruffs we found 3 single nucleotide polymorphisms, representing 1 nonsynonymous and 2 synonymous amino acid substitutions. None were associated with major dark/light color differences or the presence/absence of ornamental plumage. At all amino acid sites known to be functionally important in other avian species with dark/light plumage color variation, ruffs were either monomorphic or the shared polymorphism did not coincide with color morph. Neither ornamental plumage color differences nor the presence/absence of ornamental plumage in ruffs are likely to be caused entirely by amino acid variation within the coding regions of the MC1R locus. Regulatory elements and structural variation at other loci may be involved in melanin expression and contribute to the extreme plumage polymorphism observed in this species. PMID:25534935

  9. Cloning and sequence analysis of the coding sequence of β-actin cDNA from the Chinese alligator and suitable internal reference primers from the β-actin gene.

    PubMed

    Zhu, H N; Zhang, S Z; Zhou, Y K; Wang, C L; Wu, X B

    2015-01-01

    β-Actin is an essential component of the cytoskeleton and is stably expressed in various tissues of animals, thus, it is commonly used as an internal reference for gene expression studies. In this study, a 1731-bp fragment of β-actin cDNA from Alligator sinensis was obtained using the homology cloning technique. Sequence analysis showed that this fragment contained the complete coding sequence of the β-actin gene (1128 bp), encoding 375 amino acids. The amino acid sequence of β-actin is highly conserved and its nucleotide sequence is slightly variable. Multiple alignment analyses showed that the nucleotide sequence of the β-actin gene from A. sinensis is very similar to sequences from birds, with 94-95% identity. Ten pairs of primers with different product sizes and different annealing temperatures were screened by PCR amplification, agarose gel electrophoresis, and DNA sequencing, and could be used as internal reference primers in gene expression studies. This study expands our knowledge of β-actin gene phylogenetic evolution and provides a basis for quantitative gene expression studies in A. sinensis. PMID:26505364

  10. New approaches for computer analysis of nucleic acid sequences.

    PubMed

    Karlin, S; Ghandour, G; Ost, F; Tavare, S; Korn, L J

    1983-09-01

    A new high-speed computer algorithm is outlined that ascertains within and between nucleic acid and protein sequences all direct repeats, dyad symmetries, and other structural relationships. Large repeats, repeats of high frequency, dyad symmetries of specified stem length and loop distance, and their distributions are determined. Significance of homologies is assessed by a hierarchy of permutation procedures. Applications are made to papovaviruses, the human papillomavirus HPV, lambda phage, the human and mouse mitochondrial genomes, and the human and mouse immunoglobulin kappa-chain genes. PMID:6577449

  11. A cDNA clone containing the entire coding sequence of a mouse H-2Kd histocompatibility antigen

    PubMed Central

    Lalanne, Jean-Louis; Delarbre, Christiane; Gachelin, Gabriel; Kourilsky, Philippe

    1983-01-01

    We have isolated a cDNA clone carrying a 1560 bp long insert which contains the entire coding and 3′ untranslated regions of an H-2Kd mouse histocompatibility antigen. Its sequence and overal features are described. They point to the existence of unique properties of DNA sequences associated with the H-2Kd antigen. PMID:6298749

  12. Automated conserved non-coding sequence (CNS) discovery reveals differences in gene content and promoter evolution among grasses

    PubMed Central

    Turco, Gina; Schnable, James C.; Pedersen, Brent; Freeling, Michael

    2013-01-01

    Conserved non-coding sequences (CNS) are islands of non-coding sequence that, like protein coding exons, show less divergence in sequence between related species than functionless DNA. Several CNSs have been demonstrated experimentally to function as cis-regulatory regions. However, the specific functions of most CNSs remain unknown. Previous searches for CNS in plants have either anchored on exons and only identified nearby sequences or required years of painstaking manual annotation. Here we present an open source tool that can accurately identify CNSs between any two related species with sequenced genomes, including both those immediately adjacent to exons and distal sequences separated by >12 kb of non-coding sequence. We have used this tool to characterize new motifs, associate CNSs with additional functions, and identify previously undetected genes encoding RNA and protein in the genomes of five grass species. We provide a list of 15,363 orthologous CNSs conserved across all grasses tested. We were also able to identify regulatory sequences present in the common ancestor of grasses that have been lost in one or more extant grass lineages. Lists of orthologous gene pairs and associated CNSs are provided for reference inbred lines of arabidopsis, Japonica rice, foxtail millet, sorghum, brachypodium, and maize. PMID:23874343

  13. The impact of chromosomal translocation locus and fusion oncogene coding sequence in synovial sarcomagenesis

    PubMed Central

    Jones, Kevin B.; Barrott, Jared J.; Xie, Mingchao; Haldar, Malay; Jin, Huifeng; Zhu, Ju-Fen; Monument, Michael J.; Mosbruger, Tim L.; Langer, Ellen M.; Randall, R. Lor; Wilson, Richard K.; Cairns, Bradley R.; Ding, Li; Capecchi, Mario R.

    2016-01-01

    Synovial sarcomas are aggressive soft-tissue malignancies that express chromosomal translocation-generated fusion genes, SS18-SSX1 or SS18-SSX2 in most cases. Here, we report a mouse sarcoma model expressing SS18-SSX1, complementing our prior model expressing SS18-SSX2. Exome sequencing identified no recurrent secondary mutations in tumors of either genotype. Most of the few mutations identified in single tumors were present in genes that were minimally or not expressed in any of the tumors. Chromosome 6, either entirely or around the fusion gene expression locus, demonstrated a copy number gain in a majority of tumors of both genotypes. Thus, by fusion oncogene coding sequence alone, SS18-SSX1 and SS18-SSX2 can each drive comparable synovial sarcomagenesis, independent from other genetic drivers. SS18-SSX1 and SS18-SSX2 tumor transcriptomes demonstrated very few consistent differences overall. In direct tumorigenesis comparisons, SS18-SSX2 was slightly more sarcomagenic than SS18-SSX1, but equivalent in its generation of biphasic histologic features. Meta-analysis of human synovial sarcoma patient series identified two tumor-gentoype-phenotype correlations that were not modeled by the mice, namely a scarcity of male hosts and biphasic histologic features among SS18-SSX2 tumors. Re-analysis of human SS18-SSX1 and SS18-SSX2 tumor transcriptomes demonstrated very few consistent differences, but highlighted increased native SSX2 expression in SS18-SSX1 tumors. This suggests that the translocated locus may drive genotype-phenotype differences more than the coding sequence of the fusion gene created. Two possible roles for native SSX2 in synovial sarcomagenesis are explored. Thus even specific partial failures of mouse genetic modeling can be instructive to human tumor biology. PMID:26947017

  14. Coding sequence divergence between two closely related plant species: Arabidopsis thaliana and Brassica rapa ssp. pekinensis.

    PubMed

    Tiffin, Peter; Hahn, Matthew W

    2002-06-01

    To characterize the coding-sequence divergence of closely related genomes, we compared DNA sequence divergence between sequences from a Brassica rapa ssp. pekinensis EST library isolated from flower buds and genomic sequences from Arabidopsis thaliana. The specific objectives were (i) to determine the distribution of and relationship between K(a) and K(s), (ii) to identify genes with the lowest and highest K(a): K(s) values, and (iii) to evaluate how codon usage has diverged between two closely related species. We found that the distribution of K(a): K(s) was unimodal, and that substitution rates were more variable at nonsynonymous than synonymous sites, and detected no evidence that K(a) and K(s) were positively correlated. Several genes had K(a): K(s) values equal to or near zero, as expected for genes that have evolved under strong selective constraint. In contrast, there were no genes with K(a): K(s) >1 and thus we found no strong evidence that any of the 218 sequences we analyzed have evolved in response to positive selection. We detected a stronger codon bias but a lower frequency of GC at synonymous sites in A. thaliana than B. rapa. Moreover, there has been a shift in the profile of most commonly used synonymous codons since these two species diverged from one another. This shift in codon usage may have been caused by stronger selection acting on codon usage or by a shift in the direction of mutational bias in the B. rapa phylogenetic lineage.

  15. Variation in conserved non-coding sequences on chromosome 5q andsusceptibility to asthma and atopy

    SciTech Connect

    Donfack, Joseph; Schneider, Daniel H.; Tan, Zheng; Kurz,Thorsten; Dubchak, Inna; Frazer, Kelly A.; Ober, Carole

    2005-09-10

    Background: Evolutionarily conserved sequences likely havebiological function. Methods: To determine whether variation in conservedsequences in non-coding DNA contributes to risk for human disease, westudied six conserved non-coding elements in the Th2 cytokine cluster onhuman chromosome 5q31 in a large Hutterite pedigree and in samples ofoutbred European American and African American asthma cases and controls.Results: Among six conserved non-coding elements (>100 bp,>70percent identity; human-mouse comparison), we identified one singlenucleotide polymorphism (SNP) in each of two conserved elements and sixSNPs in the flanking regions of three conserved elements. We genotypedour samples for four of these SNPs and an additional three SNPs each inthe IL13 and IL4 genes. While there was only modest evidence forassociation with single SNPs in the Hutterite and European Americansamples (P<0.05), there were highly significant associations inEuropean Americans between asthma and haplotypes comprised of SNPs in theIL4 gene (P<0.001), including a SNP in a conserved non-codingelement. Furthermore, variation in the IL13 gene was strongly associatedwith total IgE (P = 0.00022) and allergic sensitization to mold allergens(P = 0.00076) in the Hutterites, and more modestly associated withsensitization to molds in the European Americans and African Americans (P<0.01). Conclusion: These results indicate that there is overalllittle variation in the conserved non-coding elements on 5q31, butvariation in IL4 and IL13, including possibly one SNP in a conservedelement, influence asthma and atopic phenotypes in diversepopulations.

  16. First draft genome sequencing of indole acetic acid producing and plant growth promoting fungus Preussia sp. BSL10.

    PubMed

    Khan, Abdul Latif; Asaf, Sajjad; Khan, Abdur Rahim; Al-Harrasi, Ahmed; Al-Rawahi, Ahmed; Lee, In-Jung

    2016-05-10

    Preussia sp. BSL10, family Sporormiaceae, was actively producing phytohormone (indole-3-acetic acid) and extra-cellular enzymes (phosphatases and glucosidases). The fungus was also promoting the growth of arid-land tree-Boswellia sacra. Looking at such prospects of this fungus, we sequenced its draft genome for the first time. The Illumina based sequence analysis reveals an approximate genome size of 31.4Mbp for Preussia sp. BSL10. Based on ab initio gene prediction, total 32,312 coding sequences were annotated consisting of 11,967 coding genes, pseudogenes, and 221 tRNA genes. Furthermore, 321 carbohydrate-active enzymes were predicted and classified into many functional families. PMID:26995610

  17. Transactivation specificity is conserved among p53 family proteins and depends on a response element sequence code

    PubMed Central

    Ciribilli, Yari; Monti, Paola; Bisio, Alessandra; Nguyen, H. Thien; Ethayathulla, Abdul S.; Ramos, Ana; Foggetti, Giorgia; Menichini, Paola; Menendez, Daniel; Resnick, Michael A.; Viadiu, Hector; Fronza, Gilberto; Inga, Alberto

    2013-01-01

    Structural and biochemical studies have demonstrated that p73, p63 and p53 recognize DNA with identical amino acids and similar binding affinity. Here, measuring transactivation activity for a large number of response elements (REs) in yeast and human cell lines, we show that p53 family proteins also have overlapping transactivation profiles. We identified mutations at conserved amino acids of loops L1 and L3 in the DNA-binding domain that tune the transactivation potential nearly equally in p73, p63 and p53. For example, the mutant S139F in p73 has higher transactivation potential towards selected REs, enhanced DNA-binding cooperativity in vitro and a flexible loop L1 as seen in the crystal structure of the protein–DNA complex. By studying, how variations in the RE sequence affect transactivation specificity, we discovered a RE-transactivation code that predicts enhanced transactivation; this correlation is stronger for promoters of genes associated with apoptosis. PMID:23892287

  18. Complete nucleotide sequence and coding strategy of rice hoja blanca virus RNA4.

    PubMed

    Ramirez, B C; Lozano, I; Constantino, L M; Haenni, A L; Calvert, L A

    1993-11-01

    The complete sequence of rice hoja blanca virus (RHBV) RNA4 has been determined, based on the sequence of the corresponding cDNA clones. RNA4 consists of 1991 nucleotides with two open reading frames (ORFs). One putative ORF is located in the 5'-proximal region of the viral RNA4; it encodes a protein of predicted M(r) 20076 which corresponds to the major non-structural protein that accumulates in RHBV-infected rice plants, and which bears limited sequence identity with the helper component of tobacco vein mottling potyvirus. The other ORF is located in the 5'-proximal region of the viral complementary RNA4 and encodes a protein of predicted M(r) 32,469. Between the two ORFs is an intergenic region of 524 nucleotides, part of which can theoretically adopt a stable stem-loop structure; the 5' and 3' ends can potentially base-pair over 16 nucleotides, producing a pan-handle configuration. These characteristics are in favour of an ambisense coding strategy for RHBV RNA4. PMID:8245863

  19. CSTminer: a web tool for the identification of coding and noncoding conserved sequence tags through cross-species genome comparison.

    PubMed

    Castrignanò, Tiziana; Canali, Alessandro; Grillo, Giorgio; Liuni, Sabino; Mignone, Flavio; Pesole, Graziano

    2004-07-01

    The identification and characterization of genome tracts that are highly conserved across species during evolution may contribute significantly to the functional annotation of whole-genome sequences. Indeed, such sequences are likely to correspond to known or unknown coding exons or regulatory motifs. Here, we present a web server implementing a previously developed algorithm that, by comparing user-submitted genome sequences, is able to identify statistically significant conserved blocks and assess their coding or noncoding nature through the measure of a coding potential score. The web tool, available at http://www.caspur.it/CSTminer/, is dynamically interconnected with the Ensembl genome resources and produces a graphical output showing a map of detected conserved sequences and annotated gene features.

  20. Enzyme-free translation of DNA into sequence-defined synthetic polymers structurally unrelated to nucleic acids

    NASA Astrophysics Data System (ADS)

    Niu, Jia; Hili, Ryan; Liu, David R.

    2013-04-01

    The translation of DNA sequences into corresponding biopolymers enables the production, function and evolution of the macromolecules of life. In contrast, methods to generate sequence-defined synthetic polymers with similar levels of control have remained elusive. Here, we report the development of a DNA-templated translation system that enables the enzyme-free translation of DNA templates into sequence-defined synthetic polymers that have no necessary structural relationship with nucleic acids. We demonstrate the efficiency, sequence-specificity and generality of this translation system by oligomerizing building blocks including polyethylene glycol, α-(D)-peptides, and β-peptides in a DNA-programmed manner. Sequence-defined synthetic polymers with molecular weights of 26 kDa containing 16 consecutively coupled building blocks and 90 densely functionalized β-amino acid residues were translated from DNA templates using this strategy. We integrated the DNA-templated translation system developed here into a complete cycle of translation, coding sequence replication, template regeneration and re-translation suitable for the iterated in vitro selection of functional sequence-defined synthetic polymers unrelated in structure to nucleic acids.

  1. Composition and phylogenetic analysis of vitellogenin coding sequences in the Indonesian coelacanth Latimeria menadoensis.

    PubMed

    Canapa, Adriana; Olmo, Ettore; Forconi, Mariko; Pallavicini, Alberto; Makapedua, Monica Daisy; Biscotti, Maria Assunta; Barucca, Marco

    2012-07-01

    The coelacanth Latimeria menadoensis, a living fossil, occupies a key phylogenetic position to explore the changes that have affected the genomes of the aquatic vertebrates that colonized dry land. This is the first study to isolate and analyze L. menadoensis mRNA. Three different vitellogenin transcripts were identified and their inferred amino acid sequences compared to those of other known vertebrates. The phylogenetic data suggest that the evolutionary history of this gene family in coelacanths was characterized by a different duplication event than those which occurred in teleosts, amniotes, and amphibia. Comparison of the three sequences highlighted differences in functional sites. Moreover, despite the presence of conserved sites compared with the other oviparous vertebrates, some sites were seen to have changed, others to be similar only to those of teleosts, and others still to resemble only to those of tetrapods.

  2. Phylogenetic relationships among insect orders based on three nuclear protein-coding gene sequences.

    PubMed

    Ishiwata, Keisuke; Sasaki, Go; Ogawa, Jiro; Miyata, Takashi; Su, Zhi-Hui

    2011-02-01

    Many attempts to resolve the phylogenetic relationships of higher groups of insects have been made based on both morphological and molecular evidence; nonetheless, most of the interordinal relationships of insects remain unclear or are controversial. As a new approach, in this study we sequenced three nuclear genes encoding the catalytic subunit of DNA polymerase delta and the two largest subunits of RNA polymerase II from all insect orders. The predicted amino acid sequences (In total, approx. 3500 amino acid sites) of these proteins were subjected to phylogenetic analyses based on the maximum likelihood and Bayesian analysis methods with various models. The resulting trees strongly support the monophyly of Palaeoptera, Neoptera, Polyneoptera, and Holometabola, while within Polyneoptera, the groupings of Isoptera/"Blattaria"/Mantodea (Superorder Dictyoptera), Dictyoptera/Zoraptera, Dermaptera/Plecoptera, Mantophasmatodea/Grylloblattodea, and Embioptera/Phasmatodea are supported. Although Paraneoptera is not supported as a monophyletic group, the grouping of Phthiraptera/Psocoptera is robustly supported. The interordinal relationships within Holometabola are well resolved and strongly supported that the order Hymenoptera is the sister lineage to all other holometabolous insects. The other orders of Holometabola are separated into two large groups, and the interordinal relationships of each group are (((Siphonaptera, Mecoptera), Diptera), (Trichoptera, Lepidoptera)) and ((Coleoptera, Strepsiptera), (Neuroptera, Raphidioptera, Megaloptera)). The sister relationship between Strepsiptera and Diptera are significantly rejected by all the statistical tests (AU, KH and wSH), while the affinity between Hymenoptera and Mecopterida are significantly rejected only by AU and KH tests. Our results show that the use of amino acid sequences of these three nuclear genes is an effective approach for resolving the relationships of higher groups of insects. PMID:21075208

  3. Detecting selection in the blue crab, Callinectes sapidus, using DNA sequence data from multiple nuclear protein-coding genes.

    PubMed

    Yednock, Bree K; Neigel, Joseph E

    2014-01-01

    The identification of genes involved in the adaptive evolution of non-model organisms with uncharacterized genomes constitutes a major challenge. This study employed a rigorous and targeted candidate gene approach to test for positive selection on protein-coding genes of the blue crab, Callinectes sapidus. Four genes with putative roles in physiological adaptation to environmental stress were chosen as candidates. A fifth gene not expected to play a role in environmental adaptation was used as a control. Large samples (n>800) of DNA sequences from C. sapidus were used in tests of selective neutrality based on sequence polymorphisms. In combination with these, sequences from the congener C. similis were used in neutrality tests based on interspecific divergence. In multiple tests, significant departures from neutral expectations and indicative of positive selection were found for the candidate gene trehalose 6-phosphate synthase (tps). These departures could not be explained by any of the historical population expansion or bottleneck scenarios that were evaluated in coalescent simulations. Evidence was also found for balancing selection at ATP-synthase subunit 9 (atps) using a maximum likelihood version of the Hudson, Kreitmen, and Aguadé test, and positive selection favoring amino acid replacements within ATP/ADP translocase (ant) was detected using the McDonald-Kreitman test. In contrast, test statistics for the control gene, ribosomal protein L12 (rpl), which presumably has experienced the same demographic effects as the candidate loci, were not significantly different from neutral expectations and could readily be explained by demographic effects. Together, these findings demonstrate the utility of the candidate gene approach for investigating adaptation at the molecular level in a marine invertebrate for which extensive genomic resources are not available.

  4. Buffalo (Bubalus bubalis) interleukin-2: sequence analysis reveals high nucleotide and amino acid identity with interleukin-2 of cattle and other ruminants.

    PubMed

    Sreekumar, E; Premraj, A; Saravanakumar, M; Rasool, T J

    2002-08-01

    A 4400-bp genomic sequence and a 332-bp truncated cDNA sequence of the interleukin-2 (IL-2) gene of Indian water buffalo (Bubalus bubalis) were amplified by polymerase chain reaction and cloned. The coding sequence of the buffalo IL-2 gene was assembled from the 5' end of the genomic clone and the truncated cDNA clone. This sequence had 98.5% nucleotide identity and 98% amino acid identity with cattle IL-2. Three amino acid substitutions were observed at positions 63, 124 and 135. Comparison of the predicted protein structure of buffalo IL-2 with that of human and cattle IL-2 did not reveal significant differences. The putative amino acids responsible for IL-2 receptor binding were conserved in buffalo, cattle and human IL-2. The amino acid sequence of buffalo IL-2 also showed very high identity with that of other ruminants, indicating functional cross-reactivity.

  5. Beyond Junk-Variable Tandem Repeats as Facilitators of Rapid Evolution of Regulatory and Coding Sequences

    PubMed Central

    Gemayel, Rita; Cho, Janice; Boeynaems, Steven; Verstrepen, Kevin J.

    2012-01-01

    Copy Number Variations (CNVs) and Single Nucleotide Polymorphisms (SNPs) have been the major focus of most large-scale comparative genomics studies to date. Here, we discuss a third, largely ignored, type of genetic variation, namely changes in tandem repeat number. Historically, tandem repeats have been designated as non functional “junk” DNA, mostly as a result of their highly unstable nature. With the exception of tandem repeats involved in human neurodegenerative diseases, repeat variation was often believed to be neutral with no phenotypic consequences. Recent studies, however, have shown that as many as 10% to 20% of coding and regulatory sequences in eukaryotes contain an unstable repeat tract. Contrary to initial suggestions, tandem repeat variation can have useful phenotypic consequences. Examples include rapid variation in microbial cell surface, tuning of internal molecular clocks in flies and the dynamic morphological plasticity in mammals. As such, tandem repeats can be useful functional elements that facilitate evolvability and rapid adaptation. PMID:24704980

  6. Evolution and Functional Impact of Rare Coding Variation from Deep Sequencing of Human Exomes

    PubMed Central

    Tennessen, Jacob A.; Bigham, Abigail W.; O'Connor, Timothy D.; Fu, Wenqing; Kenny, Eimear E.; Gravel, Simon; McGee, Sean; Do, Ron; Liu, Xiaoming; Jun, Goo; Kang, Hyun Min; Jordan, Daniel; Leal, Suzanne M.; Gabriel, Stacey; Rieder, Mark J.; Abecasis, Goncalo; Altshuler, David; Nickerson, Deborah A.; Boerwinkle, Eric; Sunyaev, Shamil; Bustamante, Carlos D.; Bamshad, Michael J.; Akey, Joshua M.

    2013-01-01

    As a first step toward understanding how rare variants contribute to risk for complex diseases, we sequenced 15,585 human protein-coding genes to an average median depth of 111× in 2440 individuals of European (n = 1351) and African (n = 1088) ancestry. We identified over 500,000 single-nucleotide variants (SNVs), the majority of which were rare (86% with a minor allele frequency less than 0.5%), previously unknown (82%), and population-specific (82%). On average, 2.3% of the 13,595 SNVs each person carried were predicted to affect protein function of ∼313 genes per genome, and ∼95.7% of SNVs predicted to be functionally important were rare. This excess of rare functional variants is due to the combined effects of explosive, recent accelerated population growth and weak purifying selection. Furthermore, we show that large sample sizes will be required to associate rare variants with complex traits. PMID:22604720

  7. The influence of viral coding sequences on pestivirus IRES activity reveals further parallels with translation initiation in prokaryotes.

    PubMed Central

    Fletcher, Simon P; Ali, Iraj K; Kaminski, Ann; Digard, Paul; Jackson, Richard J

    2002-01-01

    Classical swine fever virus (CSFV) is a member of the pestivirus family, which shares many features in common with hepatitis C virus (HCV). It is shown here that CSFV has an exceptionally efficient cis-acting internal ribosome entry segment (IRES), which, like that of HCV, is strongly influenced by the sequences immediately downstream of the initiation codon, and is optimal with viral coding sequences in this position. Constructs that retained 17 or more codons of viral coding sequence exhibited full IRES activity, but with only 12 codons, activity was approximately 66% of maximum in vitro (though close to maximum in transfected BHK cells), whereas with just 3 codons or fewer, the activity was only approximately 15% of maximum. The minimal coding region elements required for high activity were exchanged between HCV and CSFV. Although maximum activity was observed in each case with the homologous combination of coding region and 5' UTR, the heterologous combinations were sufficiently active to rule out a highly specific functional interplay between the 5' UTR and coding sequences. On the other hand, inversion of the coding sequences resulted in low IRES activity, particularly with the HCV coding sequences. RNA structure probing showed that the efficiency of internal initiation of these chimeric constructs correlated most closely with the degree of single-strandedness of the region around and immediately downstream of the initiation codon. The low activity IRESs could not be rescued by addition of supplementary eIF4A (the initiation factor with ATP-dependent RNA helicase activity). The extreme sensitivity to secondary structure around the initiation codon is likely to be due to the fact that the eIF4F complex (which has eIF4A as one of its subunits) is not required for and does not participate in initiation on these IRESs. PMID:12515388

  8. Current status and new features of the Consensus Coding Sequence database

    PubMed Central

    Farrell, Catherine M.; O’Leary, Nuala A.; Harte, Rachel A.; Loveland, Jane E.; Wilming, Laurens G.; Wallin, Craig; Diekhans, Mark; Barrell, Daniel; Searle, Stephen M. J.; Aken, Bronwen; Hiatt, Susan M.; Frankish, Adam; Suner, Marie-Marthe; Rajput, Bhanu; Steward, Charles A.; Brown, Garth R.; Bennett, Ruth; Murphy, Michael; Wu, Wendy; Kay, Mike P.; Hart, Jennifer; Rajan, Jeena; Weber, Janet; Snow, Catherine; Riddick, Lillian D.; Hunt, Toby; Webb, David; Thomas, Mark; Tamez, Pamela; Rangwala, Sanjida H.; McGarvey, Kelly M.; Pujar, Shashikant; Shkeda, Andrei; Mudge, Jonathan M.; Gonzalez, Jose M.; Gilbert, James G. R.; Trevanion, Stephen J.; Baertsch, Robert; Harrow, Jennifer L.; Hubbard, Tim; Ostell, James M.; Haussler, David; Pruitt, Kim D.

    2014-01-01

    The Consensus Coding Sequence (CCDS) project (http://www.ncbi.nlm.nih.gov/CCDS/) is a collaborative effort to maintain a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assemblies by the National Center for Biotechnology Information (NCBI) and Ensembl genome annotation pipelines. Identical annotations that pass quality assurance tests are tracked with a stable identifier (CCDS ID). Members of the collaboration, who are from NCBI, the Wellcome Trust Sanger Institute and the University of California Santa Cruz, provide coordinated and continuous review of the dataset to ensure high-quality CCDS representations. We describe here the current status and recent growth in the CCDS dataset, as well as recent changes to the CCDS web and FTP sites. These changes include more explicit reporting about the NCBI and Ensembl annotation releases being compared, new search and display options, the addition of biologically descriptive information and our approach to representing genes for which support evidence is incomplete. We also present a summary of recent and future curation targets. PMID:24217909

  9. Amino acid substitutions in genetic variants of human serum albumin and in sequences inferred from molecular cloning

    SciTech Connect

    Takahashi, N.; Takahashi, Y.; Blumberg, B.S.; Putnam, F.W.

    1987-07-01

    The structural changes in four genetic variants of human serum albumin were analyzed by tandem high-pressure liquid chromatography (HPLC) of the tryptic peptides, HPLC mapping and isoelectric focusing of the CNBr fragments, and amino acid sequence analysis of the purified peptides. Lysine-372 of normal (common) albumin A was changed to glutamic acid both in albumin Naskapi, a widespread polymorphic variant of North American Indians, and in albumin Mersin found in Eti Turks. The two variants also exhibited anomalous migration in NaDodSO/sub 4//PAGE, which is attributed to a conformational change. The identity of albumins Naskapi and Mersin may have originated through descent from a common mid-Asiatic founder of the two migrating ethnic groups, or it may represent identical but independent mutations of the albumin gene. In albumin Adana, from Eti Turks, the substitution site was not identified but was localized to the region from positions 447 through 548. The substitution of aspartic acid-550 by glycine was found in albumin Mexico-2 from four individuals of the Pima tribe. Although only single-point substitutions have been found in these and in certain other genetic variants of human albumin, five differences exist in the amino acid sequences inferred from cDNA sequences by workers in three other laboratories. However, our results on albumin A and on 14 different genetic variants accord with the amino acid sequence of albumin deduced from the genomic sequence. The apparent amino acid substitutions inferred from comparison of individual cDNA sequences probably reflect artifacts in cloning or in cDNA sequence analysis rather than polymorphism of the coding sections of the albumin gene.

  10. Amino acid substitutions in genetic variants of human serum albumin and in sequences inferred from molecular cloning.

    PubMed

    Takahashi, N; Takahashi, Y; Blumberg, B S; Putnam, F W

    1987-07-01

    The structural changes in four genetic variants of human serum albumin were analyzed by tandem high-pressure liquid chromatography (HPLC) of the tryptic peptides, HPLC mapping and isoelectric focusing of the CNBr fragments, and amino acid sequence analysis of the purified peptides. Lysine-372 of normal (common) albumin A was changed to glutamic acid both in albumin Naskapi, a widespread polymorphic variant of North American Indians, and in albumin Mersin found in Eti Turks. The two variants also exhibited anomalous migration in NaDodSO4/PAGE, which is attributed to a conformational change. The identity of albumins Naskapi and Mersin may have originated through descent from a common mid-Asiatic founder of the two migrating ethnic groups, or it may represent identical but independent mutations of the albumin gene. In albumin Adana, from Eti Turks, the substitution site was not identified but was localized to the region from positions 447 through 548. The substitution of aspartic acid-550 by glycine was found in albumin Mexico-2 from four individuals of the Pima tribe. Although only single-point substitutions have been found in these and in certain other genetic variants of human albumin, five differences exist in the amino acid sequences inferred from cDNA sequences by workers in three other laboratories. However, our results on albumin A and on 14 different genetic variants accord with the amino acid sequence of albumin deduced from the genomic sequence. The apparent amino acid substitutions inferred from comparison of individual cDNA sequences probably reflect artifacts in cloning or in cDNA sequence analysis rather than polymorphism of the coding sections of the albumin gene.

  11. Cloning, sequencing, and expression of the apa gene coding for the Mycobacterium tuberculosis 45/47-kilodalton secreted antigen complex.

    PubMed

    Laqueyrerie, A; Militzer, P; Romain, F; Eiglmeier, K; Cole, S; Marchal, G

    1995-10-01

    Effective protection against a virulent challenge with Mycobacterium tuberculosis is induced mainly by previous immunization with living attenuated mycobacteria, and it has been hypothesized that secreted proteins serve as major targets in the specific immune response. To identify and purify molecules present in culture medium filtrate which are dominant antigens during effective vaccination, a two-step selection procedure was used to select antigens able to interact with T lymphocytes and/or antibodies induced by immunization with living bacteria and to counterselect antigens interacting with the immune effectors induced by immunization with dead bacteria. A Mycobacterium bovis BCG 45/47-kDa antigen complex, present in BCG culture filtrate, has been previously identified and isolated (F. Romain, A. Laqueyrerie, P. Militzer, P. Pescher, P. Chavarot, M. Lagranderie, G. Auregan, M. Gheorghiu, and G. Marchal, Infect. Immun. 61:742-750, 1993). Since the cognate antibodies recognize the very same antigens present in M. tuberculosis culture medium filtrates, a project was undertaken to clone, express, and sequence the corresponding gene of M. tuberculosis. An M. tuberculosis shuttle cosmid library was transferred in Mycobacterium smegmatis and screened with a competitive enzyme-linked immunosorbent assay to detect the clones expressing the proteins. A clone containing a 40-kb DNA insert was selected, and by means of subcloning in Escherichia coli, a 2-kb fragment that coded for the molecules was identified. An open reading frame in the 2,061-nucleotide sequence codes for a secreted protein with a consensus signal peptide of 39 amino acids and a predicted molecular mass of 28,779 Da. The gene was referred to as apa because of the high percentages of proline (21.7%) and alanine (19%) in the purified protein. Southern hybridization analysis of digested total genomic DNA from M. tuberculosis (reference strains H37Rv and H37Ra) indicated that the apa gene was present as a

  12. Natural selection on coding and noncoding DNA sequences is associated with virulence genes in a plant pathogenic fungus.

    PubMed

    Rech, Gabriel E; Sanz-Martín, José M; Anisimova, Maria; Sukno, Serenella A; Thon, Michael R

    2014-09-04

    Natural selection leaves imprints on DNA, offering the opportunity to identify functionally important regions of the genome. Identifying the genomic regions affected by natural selection within pathogens can aid in the pursuit of effective strategies to control diseases. In this study, we analyzed genome-wide patterns of selection acting on different classes of sequences in a worldwide sample of eight strains of the model plant-pathogenic fungus Colletotrichum graminicola. We found evidence of selective sweeps, balancing selection, and positive selection affecting both protein-coding and noncoding DNA of pathogenicity-related sequences. Genes encoding putative effector proteins and secondary metabolite biosynthetic enzymes show evidence of positive selection acting on the coding sequence, consistent with an Arms Race model of evolution. The 5' untranslated regions (UTRs) of genes coding for effector proteins and genes upregulated during infection show an excess of high-frequency polymorphisms likely the consequence of balancing selection and consistent with the Red Queen hypothesis of evolution acting on these putative regulatory sequences. Based on the findings of this work, we propose that even though adaptive substitutions on coding sequences are important for proteins that interact directly with the host, polymorphisms in the regulatory sequences may confer flexibility of gene expression in the virulence processes of this important plant pathogen.

  13. Natural Selection on Coding and Noncoding DNA Sequences Is Associated with Virulence Genes in a Plant Pathogenic Fungus

    PubMed Central

    Rech, Gabriel E.; Sanz-Martín, José M.; Anisimova, Maria; Sukno, Serenella A.; Thon, Michael R.

    2014-01-01

    Natural selection leaves imprints on DNA, offering the opportunity to identify functionally important regions of the genome. Identifying the genomic regions affected by natural selection within pathogens can aid in the pursuit of effective strategies to control diseases. In this study, we analyzed genome-wide patterns of selection acting on different classes of sequences in a worldwide sample of eight strains of the model plant-pathogenic fungus Colletotrichum graminicola. We found evidence of selective sweeps, balancing selection, and positive selection affecting both protein-coding and noncoding DNA of pathogenicity-related sequences. Genes encoding putative effector proteins and secondary metabolite biosynthetic enzymes show evidence of positive selection acting on the coding sequence, consistent with an Arms Race model of evolution. The 5′ untranslated regions (UTRs) of genes coding for effector proteins and genes upregulated during infection show an excess of high-frequency polymorphisms likely the consequence of balancing selection and consistent with the Red Queen hypothesis of evolution acting on these putative regulatory sequences. Based on the findings of this work, we propose that even though adaptive substitutions on coding sequences are important for proteins that interact directly with the host, polymorphisms in the regulatory sequences may confer flexibility of gene expression in the virulence processes of this important plant pathogen. PMID:25193312

  14. Detection by real time PCR of walnut allergen coding sequences in processed foods.

    PubMed

    Linacero, Rosario; Ballesteros, Isabel; Sanchiz, Africa; Prieto, Nuria; Iniesto, Elisa; Martinez, Yolanda; Pedrosa, Mercedes M; Muzquiz, Mercedes; Cabanillas, Beatriz; Rovira, Mercè; Burbano, Carmen; Cuadrado, Carmen

    2016-07-01

    A quantitative real-time PCR (RT-PCR) method, employing novel primer sets designed on Jug r 1, Jug r 3, and Jug r 4 allergen-coding sequences, was set up and validated. Its specificity, sensitivity, and applicability were evaluated. The DNA extraction method based on CTAB-phenol-chloroform was best for walnut. RT-PCR allowed a specific and accurate amplification of allergen sequence, and the limit of detection was 2.5pg of walnut DNA. The method sensitivity and robustness were confirmed with spiked samples, and Jug r 3 primers detected up to 100mg/kg of raw walnut (LOD 0.01%, LOQ 0.05%). Thermal treatment combined with pressure (autoclaving) reduced yield and amplification (integrity and quality) of walnut DNA. High hydrostatic pressure (HHP) did not produce any effect on the walnut DNA amplification. This RT-PCR method showed greater sensitivity and reliability in the detection of walnut traces in commercial foodstuffs compared with ELISA assays.

  15. An alternative strategy to generate coding sequence of macrophage migration inhibitory factor-2 of Wuchereria bancrofti

    PubMed Central

    Chauhan, Nikhil; Hoti, S.L.

    2016-01-01

    Background & objectives: Different developmental stages of Wuchereria bancrofti, the major causal organism of lymphatic filariasis (LF), are difficult to obtain. Beside this limitation, to obtain complete coding sequence (CDS) of a gene one has to isolate mRNA and perform subsequent cDNA synthesis which is laborious and not successful at times. In this study, an alternative strategy employing polymerase chain reaction (PCR) was optimized and validated, to generate CDS of Macrophage migration Inhibitory Factor-2 (wbMIF-2), a gene expressed in the transition stage between L3 to L4. Methods: The genomic DNA of W. bancrofti microfilariae was extracted and used to amplify the full length wbMIF-2 gene (4.275 kb). This amplified product was used as a template for amplifying the exons separately, using the overlapping primers, which were then assembled through another round of PCR. Results: A simple strategy was developed based on PCR, which is used routinely in molecular biology laboratories. The amplified CDS of 363 bp of wbMIF-2 generated using genomic DNA splicing technique was devoid of any intronic sequence. Interpretation & conclusions: The cDNA of wbMIF-2 gene was successfully amplified from genomic DNA of microfilarial stage of W. bancrofti thus circumventing the use of inaccessible L3-L4 transitional stage of this parasite. This strategy is useful for generating CDS of genes from parasites that have restricted availability. PMID:27121522

  16. Detection by real time PCR of walnut allergen coding sequences in processed foods.

    PubMed

    Linacero, Rosario; Ballesteros, Isabel; Sanchiz, Africa; Prieto, Nuria; Iniesto, Elisa; Martinez, Yolanda; Pedrosa, Mercedes M; Muzquiz, Mercedes; Cabanillas, Beatriz; Rovira, Mercè; Burbano, Carmen; Cuadrado, Carmen

    2016-07-01

    A quantitative real-time PCR (RT-PCR) method, employing novel primer sets designed on Jug r 1, Jug r 3, and Jug r 4 allergen-coding sequences, was set up and validated. Its specificity, sensitivity, and applicability were evaluated. The DNA extraction method based on CTAB-phenol-chloroform was best for walnut. RT-PCR allowed a specific and accurate amplification of allergen sequence, and the limit of detection was 2.5pg of walnut DNA. The method sensitivity and robustness were confirmed with spiked samples, and Jug r 3 primers detected up to 100mg/kg of raw walnut (LOD 0.01%, LOQ 0.05%). Thermal treatment combined with pressure (autoclaving) reduced yield and amplification (integrity and quality) of walnut DNA. High hydrostatic pressure (HHP) did not produce any effect on the walnut DNA amplification. This RT-PCR method showed greater sensitivity and reliability in the detection of walnut traces in commercial foodstuffs compared with ELISA assays. PMID:26920302

  17. Predicting protein disorder by analyzing amino acid sequence

    PubMed Central

    Yang, Jack Y; Yang, Mary Qu

    2008-01-01

    Background Many protein regions and some entire proteins have no definite tertiary structure, presenting instead as dynamic, disorder ensembles under different physiochemical circumstances. These proteins and regions are known as Intrinsically Unstructured Proteins (IUP). IUP have been associated with a wide range of protein functions, along with roles in diseases characterized by protein misfolding and aggregation. Results Identifying IUP is important task in structural and functional genomics. We exact useful features from sequences and develop machine learning algorithms for the above task. We compare our IUP predictor with PONDRs (mainly neural-network-based predictors), disEMBL (also based on neural networks) and Globplot (based on disorder propensity). Conclusion We find that augmenting features derived from physiochemical properties of amino acids (such as hydrophobicity, complexity etc.) and using ensemble method proved beneficial. The IUP predictor is a viable alternative software tool for identifying IUP protein regions and proteins. PMID:18831799

  18. The all pervasive principle of repetitious recurrence governs not only coding sequence construction but also human endeavor in musical composition.

    PubMed

    Ohno, S; Ohno, M

    1986-01-01

    Organisms which have evolved on this earth are governed by multitudes of periodicities; tomorrow is another today, and the next year is going to be much like this year. Accordingly, the principle of repetitious recurrence pervades every aspect of life on this earth. Thus, individual genes in the genome have been duplicated and triplicated often to the point of redundancy, and each coding sequence consists of numerous variously truncated as well as variously base-substituted copies of the original primordial building block base oligomers and their allies. This principle even appears to govern the manifestations of human intellect; musical compositions also rely on this principle of repetitious recurrence. Accordingly, coding base sequences can be transformed into musical scores using one set rule. Conversely, musical scores can be transcribed to coding base sequences of long open reading frames.

  19. Host range selection of vaccinia recombinants containing insertions of foreign genes into non-coding sequences.

    PubMed

    Smith, K A; Stallard, V; Roos, J M; Hart, C; Cormier, N; Cohen, L K; Roberts, B E; Payne, L G

    1993-01-01

    A simple yet powerful selection system was developed for the insertion of foreign genes in vaccinia virus. The selection system utilizes the vaccinia virus K1L (29K) host range gene which is located in HindIII M. This gene is necessary for growth in RK-13 cells but not in BSC40 or CV-1 cells. A vaccinia mutant (vAbT33) unable to grow on RK-13 cells was constructed having sequences at the 3' end of the K1L gene and the adjacent M2L gene deleted and replaced with the beta-galactosidase gene regulated by the BamHI F (F7L) promoter. A recombination plasmid containing the hepatitis B surface (HBs) antigen gene regulated by the M2L promoter and the complete sequence of the K1L gene was used to insert the HBs gene into vAbT33. The M2L negative K1L positive recombinant was easily isolated in two rounds of plaque purification by plating the virus on RK-13 cell monolayers. The K1L gene selection system allows the isolation of recombinants arising at frequencies as low as 1/100,000. It was noted that recombinants containing vaccinia sequence duplications (promoters) resulted in intragenomic recombinations that eliminated all sequences between the duplications. A second recombination plasmid was constructed that allowed insertion into the vaccinia genome without the loss of vaccinia coding sequences. This was achieved by insertion of the pseudorabies virus GIII gene regulated by the vaccinia H5R (40K) promoter between the translation and transcription stop signals at the 3' end of the K1L gene. The K1L gene transcription stop signal thus became the stop signal for the inserted GIII gene and an upstream transcription stop signal present in the H5R promoter fragment provided the stop signal for the K1L gene. This manipulation of the vaccinia genome had no effect on the accumulation or 5' end of the M2L gene transcripts. Although the insertion lengthened the 3' end and lowered the accumulation of K1L transcripts it altered neither the virulence nor the immunogenicity of the

  20. Biased Gene Conversion and GC-Content Evolution in the Coding Sequences of Reptiles and Vertebrates

    PubMed Central

    Figuet, Emeric; Ballenghien, Marion; Romiguier, Jonathan; Galtier, Nicolas

    2015-01-01

    Mammalian and avian genomes are characterized by a substantial spatial heterogeneity of GC-content, which is often interpreted as reflecting the effect of local GC-biased gene conversion (gBGC), a meiotic repair bias that favors G and C over A and T alleles in high-recombining genomic regions. Surprisingly, the first fully sequenced nonavian sauropsid (i.e., reptile), the green anole Anolis carolinensis, revealed a highly homogeneous genomic GC-content landscape, suggesting the possibility that gBGC might not be at work in this lineage. Here, we analyze GC-content evolution at third-codon positions (GC3) in 44 vertebrates species, including eight newly sequenced transcriptomes, with a specific focus on nonavian sauropsids. We report that reptiles, including the green anole, have a genome-wide distribution of GC3 similar to that of mammals and birds, and we infer a strong GC3-heterogeneity to be already present in the tetrapod ancestor. We further show that the dynamic of coding sequence GC-content is largely governed by karyotypic features in vertebrates, notably in the green anole, in agreement with the gBGC hypothesis. The discrepancy between third-codon positions and noncoding DNA regarding GC-content dynamics in the green anole could not be explained by the activity of transposable elements or selection on codon usage. This analysis highlights the unique value of third-codon positions as an insertion/deletion-free marker of nucleotide substitution biases that ultimately affect the evolution of proteins. PMID:25527834

  1. Biased gene conversion and GC-content evolution in the coding sequences of reptiles and vertebrates.

    PubMed

    Figuet, Emeric; Ballenghien, Marion; Romiguier, Jonathan; Galtier, Nicolas

    2014-12-19

    Mammalian and avian genomes are characterized by a substantial spatial heterogeneity of GC-content, which is often interpreted as reflecting the effect of local GC-biased gene conversion (gBGC), a meiotic repair bias that favors G and C over A and T alleles in high-recombining genomic regions. Surprisingly, the first fully sequenced nonavian sauropsid (i.e., reptile), the green anole Anolis carolinensis, revealed a highly homogeneous genomic GC-content landscape, suggesting the possibility that gBGC might not be at work in this lineage. Here, we analyze GC-content evolution at third-codon positions (GC3) in 44 vertebrates species, including eight newly sequenced transcriptomes, with a specific focus on nonavian sauropsids. We report that reptiles, including the green anole, have a genome-wide distribution of GC3 similar to that of mammals and birds, and we infer a strong GC3-heterogeneity to be already present in the tetrapod ancestor. We further show that the dynamic of coding sequence GC-content is largely governed by karyotypic features in vertebrates, notably in the green anole, in agreement with the gBGC hypothesis. The discrepancy between third-codon positions and noncoding DNA regarding GC-content dynamics in the green anole could not be explained by the activity of transposable elements or selection on codon usage. This analysis highlights the unique value of third-codon positions as an insertion/deletion-free marker of nucleotide substitution biases that ultimately affect the evolution of proteins.

  2. The Hypothesis that the Genetic Code Originated in Coupled Synthesis of Proteins and the Evolutionary Predecessors of Nucleic Acids in Primitive Cells

    PubMed Central

    Francis, Brian R.

    2015-01-01

    Although analysis of the genetic code has allowed explanations for its evolution to be proposed, little evidence exists in biochemistry and molecular biology to offer an explanation for the origin of the genetic code. In particular, two features of biology make the origin of the genetic code difficult to understand. First, nucleic acids are highly complicated polymers requiring numerous enzymes for biosynthesis. Secondly, proteins have a simple backbone with a set of 20 different amino acid side chains synthesized by a highly complicated ribosomal process in which mRNA sequences are read in triplets. Apparently, both nucleic acid and protein syntheses have extensive evolutionary histories. Supporting these processes is a complex metabolism and at the hub of metabolism are the carboxylic acid cycles. This paper advances the hypothesis that the earliest predecessor of the nucleic acids was a β-linked polyester made from malic acid, a highly conserved metabolite in the carboxylic acid cycles. In the β-linked polyester, the side chains are carboxylic acid groups capable of forming interstrand double hydrogen bonds. Evolution of the nucleic acids involved changes to the backbone and side chain of poly(β-d-malic acid). Conversion of the side chain carboxylic acid into a carboxamide or a longer side chain bearing a carboxamide group, allowed information polymers to form amide pairs between polyester chains. Aminoacylation of the hydroxyl groups of malic acid and its derivatives with simple amino acids such as glycine and alanine allowed coupling of polyester synthesis and protein synthesis. Use of polypeptides containing glycine and l-alanine for activation of two different monomers with either glycine or l-alanine allowed simple coded autocatalytic synthesis of polyesters and polypeptides and established the first genetic code. A primitive cell capable of supporting electron transport, thioester synthesis, reduction reactions, and synthesis of polyesters and

  3. Probability of coding of a DNA sequence: an algorithm to predict translated reading frames from their thermodynamic characteristics.

    PubMed Central

    Tramontano, A; Macchiato, M F

    1986-01-01

    An algorithm to determine the probability that a reading frame codifies for a protein is presented. It is based on the results of our previous studies on the thermodynamic characteristics of a translated reading frame. We also develop a prediction procedure to distinguish between coding and non-coding reading frames. The procedure is based on the characteristics of the putative product of the DNA sequence and not on periodicity characteristics of the sequence, so the prediction is not biased by the presence of overlapping translated reading frames or by the presence of translated reading frames on the complementary DNA strand. PMID:3753761

  4. Heterogeneity of amino acid sequence in hippopotamus cytochrome c.

    PubMed

    Thompson, R B; Borden, D; Tarr, G E; Margoliash, E

    1978-12-25

    The amino acid sequences of chymotryptic and tryptic peptides of Hippopotamus amphibius cytochrome c were determined by a recent modification of the manual Edman sequential degradation procedure. They were ordered by comparison with the structure of the hog protein. The hippopotamus protein differs in three positions: serine, alanine, and glutamine replace alanine, glutamic acid, and lysine in positions 43, 92, and 100, respectively. Since the artiodactyl suborders diverged in the mid-Eocene some 50 million years ago, the fact that representatives of some of them show no differences in their cytochromes c (cow, sheep, and hog), while another exhibits as many as three such differences, verifies that even in relatively closely related lines of descent the rate at which cytochrome c changes in the course of evolution is not constant. Furthermore, 10.6% of the hippopotamus cytochrome c preparation was shown to contain isoleucine instead of valine at position 3, indicating that one of the four animals from which the protein was obtained was heterozygous in the cytochrome c gene. Such heterogeneity is a necessary condition of evolutionary variation and has not been previously observed in the cytochrome c of a wild mammalian population.

  5. Sequence of the cDNA and 5'-flanking region for human acid alpha-glucosidase, detection of an intron in the 5' untranslated leader sequence, definition of 18-bp polymorphisms, and differences with previous cDNA and amino acid sequences.

    PubMed

    Martiniuk, F; Mehler, M; Tzall, S; Meredith, G; Hirschhorn, R

    1990-03-01

    Acid maltase or acid alpha-glucosidase (GAA) is a lysosomal enzyme that hydrolyzes glycogen to glucose and is deficient in glycogen storage disease type II. Previously, we isolated a partial cDNA (1.9 kb) for human GAA; we have now used this cDNA to isolate and determine sequence in longer cDNAs from four additional independent cDNA libraries. Primer extension studies indicated that the mRNA extended approximately 200 bp 5' of the cDNA sequence obtained. Therefore, we isolated a genomic fragment containing 5' cDNA sequences that overlapped the previous cDNA sequence and extended an additional 24 bp to an initiation codon within a Kozak consensus sequence. The sequence of the genomic clone revealed an intron-exon junction 32 bp 5' to the ATG, indicating that the 5' leader sequence was interrupted by an intron. The remaining 186 bp of 5' untranslated sequence was identified approximately 3 kb upstream. The promoter region upstream from the start site of transcription was GC rich and contained areas of homology to Sp1 binding sites but no identifiable CAAT or TATA box. The combined data gave a nucleotide sequence of 2,856 bp for the coding region from the ATG to a stop codon, predicting a protein of 952 amino acids. The 3' untranslated region contained 555 bp with a polyadenylation signal at 3,385 bp followed by 16 bp prior to a poly(A) tail. This sequence of the GAA coding region differs from that reported by Hoefsloot et al. (1988) in three areas that change a total of 42 amino acids. Direct determination of the amino acid sequence in one of these areas confirmed the nucleotide sequence reported here but also disagreed with the directly determined amino acid sequence reported by Hoefsloot et al. (1988). At two other areas, changes in base pairs predicted new restriction sites that were identified in cDNAs from several independent libraries. The amino acid changes in all three ares increased the homology to rabbit-human isomaltase. Therefore, we believe that our

  6. Molecular cloning, encoding sequence, and expression of vaccinia virus nucleic acid-dependent nucleoside triphosphatase gene.

    PubMed Central

    Rodriguez, J F; Kahn, J S; Esteban, M

    1986-01-01

    A rabbit poxvirus genomic library contained within the expression vector lambda gt11 was screened with polyclonal antiserum prepared against vaccinia virus nucleic acid-dependent nucleoside triphosphatase (NTPase)-I enzyme. Five positive phage clones containing from 0.72- to 2.5-kilobase-pair (kbp) inserts expressed a beta-galactosidase fusion protein that was reactive by immunoblotting with the NTPase-I antibody. Hybridization analysis allowed the location of this gene within the vaccinia HindIIID restriction fragment. From the known nucleotide sequence of the 16-kbp vaccinia HindIIID fragment, we identified a region that contains a 1896-base open reading frame coding for a 631-amino acid protein. Analysis of the complete sequence revealed a highly basic protein, with hydrophilic COOH and NH2 termini, various hydrophobic domains, and no significant homology to other known proteins. Translational studies demonstrate that NTPase-I belongs to a late class of viral genes. This protein is highly conserved among Orthopoxviruses. Images PMID:3025846

  7. Molecular cloning, encoding sequence, and expression of vaccinia virus nucleic acid-dependent nucleoside triphosphatase gene.

    PubMed

    Rodriguez, J F; Kahn, J S; Esteban, M

    1986-12-01

    A rabbit poxvirus genomic library contained within the expression vector lambda gt11 was screened with polyclonal antiserum prepared against vaccinia virus nucleic acid-dependent nucleoside triphosphatase (NTPase)-I enzyme. Five positive phage clones containing from 0.72- to 2.5-kilobase-pair (kbp) inserts expressed a beta-galactosidase fusion protein that was reactive by immunoblotting with the NTPase-I antibody. Hybridization analysis allowed the location of this gene within the vaccinia HindIIID restriction fragment. From the known nucleotide sequence of the 16-kbp vaccinia HindIIID fragment, we identified a region that contains a 1896-base open reading frame coding for a 631-amino acid protein. Analysis of the complete sequence revealed a highly basic protein, with hydrophilic COOH and NH2 termini, various hydrophobic domains, and no significant homology to other known proteins. Translational studies demonstrate that NTPase-I belongs to a late class of viral genes. This protein is highly conserved among Orthopoxviruses.

  8. Identification and characterization of small non-coding RNAs from Chinese fir by high throughput sequencing

    PubMed Central

    2012-01-01

    Background Small non-coding RNAs (sRNAs) play key roles in plant development, growth and responses to biotic and abiotic stresses. At least four classes of sRNAs have been well characterized in plants, including repeat-associated siRNAs (rasiRNAs), microRNAs (miRNAs), trans-acting siRNAs (tasiRNAs) and natural antisense transcript-derived siRNAs. Chinese fir (Cunninghamia lanceolata) is one of the most important coniferous evergreen tree species in China. No sRNA from Chinese fir has been described to date. Results To obtain sRNAs in Chinese fir, we sequenced a sRNA library generated from seeds, seedlings, leaves, stems and calli, using Illumina high throughput sequencing technology. A comprehensive set of sRNAs were acquired, including conserved and novel miRNAs, rasiRNAs and tasiRNAs. With BLASTN and MIREAP we identified a total of 115 conserved miRNAs comprising 40 miRNA families and one novel miRNA with precursor sequence. The expressions of 16 conserved and one novel miRNAs and one tasiRNA were detected by RT-PCR. Utilizing real time RT-PCR, we revealed that four conserved and one novel miRNAs displayed developmental stage-specific expression patterns in Chinese fir. In addition, 209 unigenes were predicted to be targets of 30 Chinese fir miRNA families, of which five target genes were experimentally verified by 5' RACE, including a squamosa promoter-binding protein gene, a pentatricopeptide (PPR) repeat-containing protein gene, a BolA-like family protein gene, AGO1 and a gene of unknown function. We also demonstrated that the DCL3-dependent rasiRNA biogenesis pathway, which had been considered absent in conifers, existed in Chinese fir. Furthermore, the miR390-TAS3-ARF regulatory pathway was elucidated. Conclusions We unveiled a complex population of sRNAs in Chinese fir through high throughput sequencing. This provides an insight into the composition and function of sRNAs in Chinese fir and sheds new light on land plant sRNA evolution. PMID:22894611

  9. Human liver apolipoprotein B-100 cDNA: complete nucleic acid and derived amino acid sequence.

    PubMed Central

    Law, S W; Grant, S M; Higuchi, K; Hospattankar, A; Lackner, K; Lee, N; Brewer, H B

    1986-01-01

    Human apolipoprotein B-100 (apoB-100), the ligand on low density lipoproteins that interacts with the low density lipoprotein receptor and initiates receptor-mediated endocytosis and low density lipoprotein catabolism, has been cloned, and the complete nucleic acid and derived amino acid sequences have been determined. ApoB-100 cDNAs were isolated from normal human liver cDNA libraries utilizing immunoscreening as well as filter hybridization with radiolabeled apoB-100 oligodeoxynucleotides. The apoB-100 mRNA is 14.1 kilobases long encoding a mature apoB-100 protein of 4536 amino acids with a calculated amino acid molecular weight of 512,723. ApoB-100 contains 20 potential glycosylation sites, and 12 of a total of 25 cysteine residues are located in the amino-terminal region of the apolipoprotein providing a potential globular structure of the amino terminus of the protein. ApoB-100 contains relatively few regions of amphipathic helices, but compared to other human apolipoproteins it is enriched in beta-structure. The delineation of the entire human apoB-100 sequence will now permit a detailed analysis of the conformation of the protein, the low density lipoprotein receptor binding domain(s), and the structural relationship between apoB-100 and apoB-48 and will provide the basis for the study of genetic defects in apoB-100 in patients with dyslipoproteinemias. PMID:3464946

  10. The evolution of proteins from random amino acid sequences: II. Evidence from the statistical distributions of the lengths of modern protein sequences.

    PubMed

    White, S H

    1994-04-01

    entirely consistent with the observations of Brown et al. (1990a,b, Nucleic Acids Res 18:2079-2086 and 18: 6339-6345) which show that tetra-nucleotides (stop codon plus following nucleotide) are the actual signals for termination of translation in both prokaryotes and eukaryotes. Second, the strong dependence of statistical length distributions on sequence-termination signaling codes implies that the evolution of stop codons and translation-termination processes was as important as gene splicing in early evolution. Third, because the theory is based upon a simple no-exon stochastic model, it provides a plausible alternative to a limited universe of exons from which all proteins evolved by gene duplication and exon splicing (Dorit et al. 1990, Science 250:1377-1382).

  11. Isolation and nucleotide sequence of mouse NCAM cDNA that codes for a Mr 79,000 polypeptide without a membrane-spanning region.

    PubMed Central

    Barthels, D; Santoni, M J; Wille, W; Ruppert, C; Chaix, J C; Hirsch, M R; Fontecilla-Camps, J C; Goridis, C

    1987-01-01

    The neural cell adhesion molecule (NCAM) exists in several isoforms which are selectively expressed by different cell types and at different stages of development. In the mouse, three proteins with apparent Mr's of 180,000, 140,000 and 120,000 have been distinguished that are encoded by 4-5 different mRNAs. Here we report the full amino acid sequence of a NCAM protein inferred from the sequences of overlapping cDNA clones. The 706-residue polypeptide contains, towards its N-terminus, 5 domains that share structural homology with members of the immunoglobulin supergene family. The sequence does not encode a typical membrane-spanning segment, but ends with 24 uncharged amino acids followed by two stop codons. This fact, together with size considerations, make it highly likely that our sequence represents NCAM-120, which lacks transmembrane or cytoplasmic domains and is attached to the membrane by phospholipid. Probes from the 5' region detect all four NCAM gene transcripts present in mouse brain consistent with the notion that the extracellular domains are common to most NCAM forms. However, a 3' probe corresponding to the hydrophobic tail and non-coding region hybridizes specifically with the smallest mRNA species. S1 nuclease protection experiments indicate that this region is encoded by exon(s) spliced out from the other mRNAs. Furthermore, our clones that are highly homologous to a published chicken NCAM sequence which codes for putative transmembrane and cytoplasmic domains elsewhere, diverge from it at the presumptive splice junction. It appears thus that alternate use of exons determines whether NCAM proteins with membrane-spanning domains are synthesized.(ABSTRACT TRUNCATED AT 250 WORDS) Images Fig. 3. Fig. 4. Fig. 5. PMID:3595563

  12. The Use of Coded PCR Primers Enables High-Throughput Sequencing of Multiple Homolog Amplification Products by 454 Parallel Sequencing

    PubMed Central

    Bollback, Jonathan P.; Panitz, Frank; Bendixen, Christian; Nielsen, Rasmus; Willerslev, Eske

    2007-01-01

    Background The invention of the Genome Sequence 20™ DNA Sequencing System (454 parallel sequencing platform) has enabled the rapid and high-volume production of sequence data. Until now, however, individual emulsion PCR (emPCR) reactions and subsequent sequencing runs have been unable to combine template DNA from multiple individuals, as homologous sequences cannot be subsequently assigned to their original sources. Methodology We use conventional PCR with 5′-nucleotide tagged primers to generate homologous DNA amplification products from multiple specimens, followed by sequencing through the high-throughput Genome Sequence 20™ DNA Sequencing System (GS20, Roche/454 Life Sciences). Each DNA sequence is subsequently traced back to its individual source through 5′tag-analysis. Conclusions We demonstrate that this new approach enables the assignment of virtually all the generated DNA sequences to the correct source once sequencing anomalies are accounted for (miss-assignment rate<0.4%). Therefore, the method enables accurate sequencing and assignment of homologous DNA sequences from multiple sources in single high-throughput GS20 run. We observe a bias in the distribution of the differently tagged primers that is dependent on the 5′ nucleotide of the tag. In particular, primers 5′ labelled with a cytosine are heavily overrepresented among the final sequences, while those 5′ labelled with a thymine are strongly underrepresented. A weaker bias also exists with regards to the distribution of the sequences as sorted by the second nucleotide of the dinucleotide tags. As the results are based on a single GS20 run, the general applicability of the approach requires confirmation. However, our experiments demonstrate that 5′primer tagging is a useful method in which the sequencing power of the GS20 can be applied to PCR-based assays of multiple homologous PCR products. The new approach will be of value to a broad range of research areas, such as those of

  13. Coding potential and transcript analysis of fowl adenovirus 4: insight into upstream ORFs as common sequence features in adenoviral transcripts.

    PubMed

    Griffin, Bryan D; Nagy, Eva

    2011-06-01

    Recombinant fowl adenoviruses (FAdVs) have been successfully used as veterinary vaccine vectors. However, insufficient definitions of the protein-coding and non-coding regions and an incomplete understanding of virus-host interactions limit the progress of next-generation vectors. FAdVs are known to cause several diseases of poultry. Certain isolates of species FAdV-C are the aetiological agent of inclusion body hepatitis/hydropericardium syndrome (IBH/HPS). In this study, we report the complete 45667 bp genome sequence of FAdV-4 of species FAdV-C. Assessment of the protein-coding potential of FAdV-4 was carried out with the Bio-Dictionary-based Gene Finder together with an evaluation of sequence conservation among species FAdV-A and FAdV-D. On this basis, 46 potentially protein-coding ORFs were identified. Of these, 33 and 13 ORFs were assigned high and low protein-coding potential, respectively. Homologues of the ancestral adenoviral genes were, with few exceptions, assigned high protein-coding potential. ORFs that were unique to the FAdVs were differentiated into high and low protein-coding potential groups. Notable putative genes with high protein-coding capacity included the previously unreported fiber 1, hypothetical 10.3K and hypothetical 10.5K genes. Transcript analysis revealed that several of the small ORFs less than 300 nt in length that were assigned low coding potential contributed to upstream ORFs (uORFs) in important mRNAs, including the ORF22 mRNA. Subsequent analysis of the previously reported transcripts of FAdV-1, FAdV-9, human adenovirus 2 and bovine adenovirus 3 identified widespread uORFs in AdV mRNAs that have the potential to act as important translational regulatory elements.

  14. Phylogenetic analysis of evolutionary relationships of the planctomycete division of the domain bacteria based on amino acid sequences of elongation factor Tu.

    PubMed

    Jenkins, C; Fuerst, J A

    2001-05-01

    Sequences from the tuf gene coding for the elongation factor EF-Tu were amplified and sequenced from the genomic DNA of Pirellula marina and Isosphaera pallida, two species of bacteria within the order Planctomycetales. A near-complete (1140-bp) sequence was obtained from Pi. marina and a partial (759-bp) sequence was obtained for I. pallida. Alignment of the deduced Pi. marina EF-Tu amino acid sequence against reference sequences demonstrated the presence of a unique 11-amino acid sequence motif not present in any other division of the domain Bacteria. Pi. marina shared the highest percentage amino acid sequence identity with I. pallida but showed only a low percentage identity with other members of the domain Bacteria. This is consistent with the concept of the planctomycetes as a unique division of the Bacteria. Neither primary sequence comparison of EF-Tu nor phylogenetic analysis supports any close relationship between planctomycetes and the chlamydiae, which has previously been postulated on the basis of 16S rRNA. Phylogenetic analysis of aligned EF-Tu amino acid sequences performed using distance, maximum-parsimony, and maximum-likelihood approaches yielded contradictory results with respect to the position of planctomycetes relative to other bacteria. It is hypothesized that long-branch attraction effects due to unequal evolutionary rates and mutational saturation effects may account for some of the contradictions. PMID:11443344

  15. Conserved Non-Coding Sequences are Associated with Rates of mRNA Decay in Arabidopsis.

    PubMed

    Spangler, Jacob B; Feltus, Frank Alex

    2013-01-01

    Steady-state mRNA levels are tightly regulated through a combination of transcriptional and post-transcriptional control mechanisms. The discovery of cis-acting DNA elements that encode these control mechanisms is of high importance. We have investigated the influence of conserved non-coding sequences (CNSs), DNA patterns retained after an ancient whole genome duplication event, on the breadth of gene expression and the rates of mRNA decay in Arabidopsis thaliana. The absence of CNSs near α duplicate genes was associated with a decrease in breadth of gene expression and slower mRNA decay rates while the presence CNSs near α duplicates was associated with an increase in breadth of gene expression and faster mRNA decay rates. The observed difference in mRNA decay rate was fastest in genes with CNSs in both non-transcribed and transcribed regions, albeit through an unknown mechanism. This study supports the notion that some Arabidopsis CNSs regulate the steady-state mRNA levels through post-transcriptional control mechanisms and that CNSs also play a role in controlling the breadth of gene expression.

  16. Mice carrying a complete deletion of the talin2 coding sequence are viable and fertile

    SciTech Connect

    Debrand, Emmanuel; Conti, Francesco J.; Bate, Neil; Spence, Lorraine; Mazzeo, Daniela; Pritchard, Catrin A.; Monkley, Susan J.; Critchley, David R.

    2012-09-21

    Highlights: Black-Right-Pointing-Pointer Mice lacking talin2 are viable and fertile with only a mildly dystrophic phenotype. Black-Right-Pointing-Pointer Talin2 null fibroblasts show no major defects in proliferation, adhesion or migration. Black-Right-Pointing-Pointer Maintaining a colony of talin2 null mice is difficult indicating an underlying defect. -- Abstract: Mice homozygous for several Tln2 gene targeted alleles are viable and fertile. Here we show that although the expression of talin2 protein is drastically reduced in muscle from these mice, other tissues continue to express talin2 albeit at reduced levels. We therefore generated a Tln2 allele lacking the entire coding sequence (Tln2{sup cd}). Tln2{sup cd/cd} mice were viable and fertile, and the genotypes of Tln2{sup cd/+} intercrosses were at the expected Mendelian ratio. Tln2{sup cd/cd} mice showed no major difference in body mass or the weight of the major organs compared to wild-type, although they displayed a mildly dystrophic phenotype. Moreover, Tln2{sup cd/cd} mouse embryo fibroblasts showed no obvious defects in cell adhesion, migration or proliferation. However, the number of Tln2{sup cd/cd} pups surviving to adulthood was variable suggesting that such mice have an underlying defect.

  17. Inquiries into the structure-function relationship of ribonuclease T1 using chemically synthesized coding sequences.

    PubMed Central

    Ikehara, M; Ohtsuka, E; Tokunaga, T; Nishikawa, S; Uesugi, S; Tanaka, T; Aoyama, Y; Kikyodani, S; Fujimoto, K; Yanase, K

    1986-01-01

    The genes for ribonuclease T1 and its site-specific mutants were chemically synthesized and introduced to Escherichia coli. All enzymes were fusion products produced by joining the synthetic gene at specific restriction sites to the synthetic gene for human growth hormone in a plasmid containing the E. coli trp promoter. The fusion protein from this plasmid contained 66% of the amino-terminal sequences of the human growth hormone, which were recognizable immunologically. RNase T1 or its mutants were cleaved from the fusion protein with cyanogen bromide. The synthetic RNase T1 endowed with the revised wild-type triad Gly-Ser-Pro, residues 71-73, was fully functional, readily hydrolyzing pGpC bonds, whereas a mutant enzyme having the originally reported, erroneous triad Pro-Gly-Ser was totally inactive. Various amino acid substitutions were also introduced to the guanosine recognition region comprised of residues 42-45, Tyr-Asn-Asn-Tyr. Substitution of either of the tyrosine residues noted above with phenylalanine had no dramatic effect on the enzyme's function. Replacement of asparagine-43 with arginine or alanine also caused only a small change in the hydrolyzing activity--a mutant enzyme maintained greater than 50% of the wild-type activity. In sharp contrast, when aspartic acid or alanine was substituted for asparagine-44, the activity was dramatically reduced to a few percent of the wild-type activity. Images PMID:3014504

  18. Human retroviruses and AIDS 1996. A compilation and analysis of nucleic acid and amino acid sequences

    SciTech Connect

    Myers, G.; Foley, B.; Korber, B.; Mellors, J.W.; Jeang, K.T.; Wain-Hobson, S.

    1997-04-01

    This compendium and the accompanying floppy diskettes are the result of an effort to compile and rapidly publish all relevant molecular data concerning the human immunodeficiency viruses (HIV) and related retroviruses. The scope of the compendium and database is best summarized by the five parts that it comprises: (1) Nuclear Acid Alignments and Sequences; (2) Amino Acid Alignments; (3) Analysis; (4) Related Sequences; and (5) Database Communications. Information within all the parts is updated throughout the year on the Web site, http://hiv-web.lanl.gov. While this publication could take the form of a review or sequence monograph, it is not so conceived. Instead, the literature from which the database is derived has simply been summarized and some elementary computational analyses have been performed upon the data. Interpretation and commentary have been avoided insofar as possible so that the reader can form his or her own judgments concerning the complex information. In addition to the general descriptions of the parts of the compendium, the user should read the individual introductions for each part.

  19. Source coherence impairments in a direct detection direct sequence optical code-division multiple-access system.

    PubMed

    Fsaifes, Ihsan; Lepers, Catherine; Lourdiane, Mounia; Gallion, Philippe; Beugin, Vincent; Guignard, Philippe

    2007-02-01

    We demonstrate that direct sequence optical code- division multiple-access (DS-OCDMA) encoders and decoders using sampled fiber Bragg gratings (S-FBGs) behave as multipath interferometers. In that case, chip pulses of the prime sequence codes generated by spreading in time-coherent data pulses can result from multiple reflections in the interferometers that can superimpose within a chip time duration. We show that the autocorrelation function has to be considered as the sum of complex amplitudes of the combined chip as the laser source coherence time is much greater than the integration time of the photodetector. To reduce the sensitivity of the DS-OCDMA system to the coherence time of the laser source, we analyze the use of sparse and nonperiodic quadratic congruence and extended quadratic congruence codes.

  20. Deep sequencing of the tobacco mitochondrial transcriptome reveals expressed ORFs and numerous editing sites outside coding regions

    PubMed Central

    2014-01-01

    Background The purpose of this study was to sequence and assemble the tobacco mitochondrial transcriptome and obtain a genomic-level view of steady-state RNA abundance. Plant mitochondrial genomes have a small number of protein coding genes with large and variably sized intergenic spaces. In the tobacco mitogenome these intergenic spaces contain numerous open reading frames (ORFs) with no clear function. Results The assembled transcriptome revealed distinct monocistronic and polycistronic transcripts along with large intergenic spaces with little to no detectable RNA. Eighteen of the 117 ORFs were found to have steady-state RNA amounts above background in both deep-sequencing and qRT-PCR experiments and ten of those were found to be polysome associated. In addition, the assembled transcriptome enabled a full mitogenome screen of RNA C→U editing sites. Six hundred and thirty five potential edits were found with 557 occurring within protein-coding genes, five in tRNA genes, and 73 in non-coding regions. These sites were found in every protein-coding transcript in the tobacco mitogenome. Conclusion These results suggest that a small number of the ORFs within the tobacco mitogenome may produce functional proteins and that RNA editing occurs in coding and non-coding regions of mitochondrial transcripts. PMID:24433288

  1. Nucleotide sequence of cDNA coding for dianthin 30, a ribosome inactivating protein from Dianthus caryophyllus.

    PubMed

    Legname, G; Bellosta, P; Gromo, G; Modena, D; Keen, J N; Roberts, L M; Lord, J M

    1991-08-27

    Rabbit antibodies raised against dianthin 30, a ribosome inactivating protein from carnation (Dianthus caryophyllus) leaves, were used to identify a full length dianthin precursor cDNA clone from a lambda gt11 expression library. N-terminal amino acid sequencing of purified dianthin 30 and dianthin 32 confirmed that the clone encoded dianthin 30. The cDNA was 1153 basepairs in length and encoded a precursor protein of 293 amino acid residues. The first 23 N-terminal amino acids of the precursor represented the signal sequence. The protein contained a carboxy-terminal region which, by analogy with barley lectin, may contain a vacuolar targeting signal.

  2. Comparisons between Arabidopsis thaliana and Drosophila melanogaster in relation to Coding and Noncoding Sequence Length and Gene Expression

    PubMed Central

    Caldwell, Rachel; Lin, Yan-Xia; Zhang, Ren

    2015-01-01

    There is a continuing interest in the analysis of gene architecture and gene expression to determine the relationship that may exist. Advances in high-quality sequencing technologies and large-scale resource datasets have increased the understanding of relationships and cross-referencing of expression data to the large genome data. Although a negative correlation between expression level and gene (especially transcript) length has been generally accepted, there have been some conflicting results arising from the literature concerning the impacts of different regions of genes, and the underlying reason is not well understood. The research aims to apply quantile regression techniques for statistical analysis of coding and noncoding sequence length and gene expression data in the plant, Arabidopsis thaliana, and fruit fly, Drosophila melanogaster, to determine if a relationship exists and if there is any variation or similarities between these species. The quantile regression analysis found that the coding sequence length and gene expression correlations varied, and similarities emerged for the noncoding sequence length (5′ and 3′ UTRs) between animal and plant species. In conclusion, the information described in this study provides the basis for further exploration into gene regulation with regard to coding and noncoding sequence length. PMID:26114098

  3. Complete Genome Sequence of the d-Amino Acid Catabolism Bacterium Phaeobacter sp. Strain JL2886, Isolated from Deep Seawater of the South China Sea.

    PubMed

    Fu, Yingnan; Wang, Rui; Zhang, Zilian; Jiao, Nianzhi

    2016-01-01

    Phaeobacter sp. strain JL2886, isolated from deep seawater of the South China Sea, can catabolize d-amino acids. Here, we report the complete genome sequence of Phaeobacter sp. JL2886. It comprises ~4.06 Mbp, with a G+C content of 61.52%. A total of 3,913 protein-coding genes and 10 genes related to d-amino acid catabolism were obtained. PMID:27587825

  4. Complete Genome Sequence of the d-Amino Acid Catabolism Bacterium Phaeobacter sp. Strain JL2886, Isolated from Deep Seawater of the South China Sea

    PubMed Central

    Fu, Yingnan; Wang, Rui

    2016-01-01

    Phaeobacter sp. strain JL2886, isolated from deep seawater of the South China Sea, can catabolize d-amino acids. Here, we report the complete genome sequence of Phaeobacter sp. JL2886. It comprises ~4.06 Mbp, with a G+C content of 61.52%. A total of 3,913 protein-coding genes and 10 genes related to d-amino acid catabolism were obtained. PMID:27587825

  5. Complete Genome Sequence of the d-Amino Acid Catabolism Bacterium Phaeobacter sp. Strain JL2886, Isolated from Deep Seawater of the South China Sea.

    PubMed

    Fu, Yingnan; Wang, Rui; Zhang, Zilian; Jiao, Nianzhi

    2016-01-01

    Phaeobacter sp. strain JL2886, isolated from deep seawater of the South China Sea, can catabolize d-amino acids. Here, we report the complete genome sequence of Phaeobacter sp. JL2886. It comprises ~4.06 Mbp, with a G+C content of 61.52%. A total of 3,913 protein-coding genes and 10 genes related to d-amino acid catabolism were obtained.

  6. Evolution of vertebrate IgM: complete amino acid sequence of the constant region of Ambystoma mexicanum mu chain deduced from cDNA sequence.

    PubMed

    Fellah, J S; Wiles, M V; Charlemagne, J; Schwager, J

    1992-10-01

    cDNA clones coding for the constant region of the Mexican axolotl (Ambystoma mexicanum) mu heavy immunoglobulin chain were selected from total spleen RNA, using a cDNA polymerase chain reaction technique. The specific 5'-end primer was an oligonucleotide homologous to the JH segment of Xenopus laevis mu chain. One of the clones, JHA/3, corresponded to the complete constant region of the axolotl mu chain, consisting of a 1362-nucleotide sequence coding for a polypeptide of 454 amino acids followed in 3' direction by a 179-nucleotide untranslated region and a polyA+ tail. The axolotl C mu is divided into four typical domains (C mu 1-C mu 4) and can be aligned with the Xenopus C mu with an overall identity of 56% at the nucleotide level. Percent identities were particularly high between C mu 1 (59%) and C mu 4 (71%). The C-terminal 20-amino acid segment which constitutes the secretory part of the mu chain is strongly homologous to the equivalent sequences of chondrichthyans and of other tetrapods, including a conserved N-linked oligosaccharide, the penultimate cysteine and the C-terminal lysine. The four C mu domains of 13 vertebrate species ranging from chondrichthyans to mammals were aligned and compared at the amino acid level. The significant number of mu-specific residues which are conserved into each of the four C mu domains argues for a continuous line of evolution of the vertebrate mu chain. This notion was confirmed by the ability to reconstitute a consistent vertebrate evolution tree based on the phylogenic parsimony analysis of the C mu 4 sequences. PMID:1382992

  7. Natural vs. random protein sequences: Discovering combinatorics properties on amino acid words.

    PubMed

    Santoni, Daniele; Felici, Giovanni; Vergni, Davide

    2016-02-21

    Casual mutations and natural selection have driven the evolution of protein amino acid sequences that we observe at present in nature. The question about which is the dominant force of proteins evolution is still lacking of an unambiguous answer. Casual mutations tend to randomize protein sequences while, in order to have the correct functionality, one expects that selection mechanisms impose rigid constraints on amino acid sequences. Moreover, one also has to consider that the space of all possible amino acid sequences is so astonishingly large that it could be reasonable to have a well tuned amino acid sequence indistinguishable from a random one. In order to study the possibility to discriminate between random and natural amino acid sequences, we introduce different measures of association between pairs of amino acids in a sequence, and apply them to a dataset of 1047 natural protein sequences and 10,470 random sequences, carefully generated in order to preserve the relative length and amino acid distribution of the natural proteins. We analyze the multidimensional measures with machine learning techniques and show that, to a reasonable extent, natural protein sequences can be differentiated from random ones.

  8. Natural vs. random protein sequences: Discovering combinatorics properties on amino acid words.

    PubMed

    Santoni, Daniele; Felici, Giovanni; Vergni, Davide

    2016-02-21

    Casual mutations and natural selection have driven the evolution of protein amino acid sequences that we observe at present in nature. The question about which is the dominant force of proteins evolution is still lacking of an unambiguous answer. Casual mutations tend to randomize protein sequences while, in order to have the correct functionality, one expects that selection mechanisms impose rigid constraints on amino acid sequences. Moreover, one also has to consider that the space of all possible amino acid sequences is so astonishingly large that it could be reasonable to have a well tuned amino acid sequence indistinguishable from a random one. In order to study the possibility to discriminate between random and natural amino acid sequences, we introduce different measures of association between pairs of amino acids in a sequence, and apply them to a dataset of 1047 natural protein sequences and 10,470 random sequences, carefully generated in order to preserve the relative length and amino acid distribution of the natural proteins. We analyze the multidimensional measures with machine learning techniques and show that, to a reasonable extent, natural protein sequences can be differentiated from random ones. PMID:26656109

  9. A novel all-optical label processing based on multiple optical orthogonal codes sequences for optical packet switching networks

    NASA Astrophysics Data System (ADS)

    Zhang, Chongfu; Qiu, Kun; Xu, Bo; Ling, Yun

    2008-05-01

    This paper proposes an all-optical label processing scheme that uses the multiple optical orthogonal codes sequences (MOOCS)-based optical label for optical packet switching (OPS) (MOOCS-OPS) networks. In this scheme, each MOOCS is a permutation or combination of the multiple optical orthogonal codes (MOOC) selected from the multiple-groups optical orthogonal codes (MGOOC). Following a comparison of different optical label processing (OLP) schemes, the principles of MOOCS-OPS network are given and analyzed. Firstly, theoretical analyses are used to prove that MOOCS is able to greatly enlarge the number of available optical labels when compared to the previous single optical orthogonal code (SOOC) for OPS (SOOC-OPS) network. Then, the key units of the MOOCS-based optical label packets, including optical packet generation, optical label erasing, optical label extraction and optical label rewriting etc., are given and studied. These results are used to verify that the proposed MOOCS-OPS scheme is feasible.

  10. The cDNA-derived amino acid sequence of hemoglobin II from Lucina pectinata.

    PubMed

    Torres-Mercado, Elineth; Renta, Jessicca Y; Rodríguez, Yolanda; López-Garriga, Juan; Cadilla, Carmen L

    2003-11-01

    Hemoglobin II from the clam Lucina pectinata is an oxygen-reactive protein with a unique structural organization in the heme pocket involving residues Gln65 (E7), Tyr30 (B10), Phe44 (CD1), and Phe69 (E11). We employed the reverse transcriptase-polymerase chain reaction (RT-PCR) and methods to synthesize various cDNA(HbII). An initial 300-bp cDNA clone was amplified from total RNA by RT-PCR using degenerate oligonucleotides. Gene-specific primers derived from the HbII-partial cDNA sequence were used to obtain the 5' and 3' ends of the cDNA by RACE. The length of the HbII cDNA, estimated from overlapping clones, was approximately 2114 bases. Northern blot analysis revealed that the mRNA size of HbII agrees with the estimated size using cDNA data. The coding region of the full-length HbII cDNA codes for 151 amino acids. The calculated molecular weight of HbII, including the heme group and acetylated N-terminal residue, is 17,654.07 Da.

  11. HybPiper: Extracting coding sequence and introns for phylogenetics from high-throughput sequencing reads using target enrichment1

    PubMed Central

    Johnson, Matthew G.; Gardner, Elliot M.; Liu, Yang; Medina, Rafael; Goffinet, Bernard; Shaw, A. Jonathan; Zerega, Nyree J. C.; Wickett, Norman J.

    2016-01-01

    Premise of the study: Using sequence data generated via target enrichment for phylogenetics requires reassembly of high-throughput sequence reads into loci, presenting a number of bioinformatics challenges. We developed HybPiper as a user-friendly platform for assembly of gene regions, extraction of exon and intron sequences, and identification of paralogous gene copies. We test HybPiper using baits designed to target 333 phylogenetic markers and 125 genes of functional significance in Artocarpus (Moraceae). Methods and Results: HybPiper implements parallel execution of sequence assembly in three phases: read mapping, contig assembly, and target sequence extraction. The pipeline was able to recover nearly complete gene sequences for all genes in 22 species of Artocarpus. HybPiper also recovered more than 500 bp of nontargeted intron sequence in over half of the phylogenetic markers and identified paralogous gene copies in Artocarpus. Conclusions: HybPiper was designed for Linux and Mac OS X and is freely available at https://github.com/mossmatters/HybPiper. PMID:27437175

  12. Proline might have been the first amino acid in the primitive genetic code.

    PubMed

    Komatsu, Reina; Sawada, Risa; Umehara, Takuya; Tamura, Koji

    2014-06-01

    Stereochemical assignment of amino acids and corresponding codons or anticodons has not been successful so far. Here, we focused on proline and GGG (anticodon of tRNA(Pro)) and investigated their mutual interaction. Circular dichroism spectroscopy revealed that guanosine nucleotides (GG, GGG) formed G-quartet structures. The structures were destroyed by adding high concentrations of proline. We propose that the possibility of the reversible proline/G-quartet interaction could have contributed to the specific assignment of proline on GGG and that this coding could have been the first in the genetic code. PMID:24973301

  13. Identification of novel Arabidopsis thaliana upstream open reading frames that control expression of the main coding sequences in a peptide sequence-dependent manner

    PubMed Central

    Ebina, Isao; Takemoto-Tsutsumi, Mariko; Watanabe, Shun; Koyama, Hiroaki; Endo, Yayoi; Kimata, Kaori; Igarashi, Takuya; Murakami, Karin; Kudo, Rin; Ohsumi, Arisa; Noh, Abdul Latif; Takahashi, Hiro; Naito, Satoshi; Onouchi, Hitoshi

    2015-01-01

    Upstream open reading frames (uORFs) are often found in the 5′-leader regions of eukaryotic mRNAs and can negatively modulate the translational efficiency of the downstream main ORF. Although the effects of most uORFs are thought to be independent of their encoded peptide sequences, certain uORFs control translation of the main ORF in a peptide sequence-dependent manner. For genome-wide identification of such peptide sequence-dependent regulatory uORFs, exhaustive searches for uORFs with conserved amino acid sequences have been conducted using bioinformatic analyses. However, whether the conserved uORFs identified by these bioinformatic approaches encode regulatory peptides has not been experimentally determined. Here we analyzed 16 recently identified Arabidopsis thaliana conserved uORFs for the effects of their amino acid sequences on the expression of the main ORF using a transient expression assay. We identified five novel uORFs that repress main ORF expression in a peptide sequence-dependent manner. Mutational analysis revealed that, in four of them, the C-terminal region of the uORF-encoded peptide is critical for the repression of main ORF expression. Intriguingly, we also identified one exceptional sequence-dependent regulatory uORF, in which the stop codon position is not conserved and the C-terminal region is not important for the repression of main ORF expression. PMID:25618853

  14. GeneFizz: A web tool to compare genetic (coding/non-coding) and physical (helix/coil) segmentations of DNA sequences. Gene discovery and evolutionary perspectives.

    PubMed

    Yeramian, Edouard; Jones, Louis

    2003-07-01

    The GeneFizz (http://pbga.pasteur.fr/GeneFizz) web tool permits the direct comparison between two types of segmentations for DNA sequences (possibly annotated): the coding/non-coding segmentation associated with genomic annotations (simple genes or exons in split genes) and the physics-based structural segmentation between helix and coil domains (as provided by the classical helix-coil model). There appears to be a varying degree of coincidence for different genomes between the two types of segmentations, from almost perfect to non-relevant. Following these two extremes, GeneFizz can be used for two purposes: ab initio physics-based identification of new genes (as recently shown for Plasmodium falciparum) or the exploration of possible evolutionary signals revealed by the discrepancies observed between the two types of information.

  15. NCAD, a database integrating the intrinsic conformational preferences of non-coded amino acids

    PubMed Central

    Revilla-López, Guillem; Torras, Juan; Curcó, David; Casanovas, Jordi; Calaza, M. Isabel; Zanuy, David; Jiménez, Ana I.; Cativiela, Carlos; Nussinov, Ruth; Grodzinski, Piotr; Alemán, Carlos

    2010-01-01

    Peptides and proteins find an ever-increasing number of applications in the biomedical and materials engineering fields. The use of non-proteinogenic amino acids endowed with diverse physicochemical and structural features opens the possibility to design proteins and peptides with novel properties and functions. Moreover, non-proteinogenic residues are particularly useful to control the three-dimensional arrangement of peptidic chains, which is a crucial issue for most applications. However, information regarding such amino acids –also called non-coded, non-canonical or non-standard– is usually scattered among publications specialized in quite diverse fields as well as in patents. Making all these data useful to the scientific community requires new tools and a framework for their assembly and coherent organization. We have successfully compiled, organized and built a database (NCAD, Non-Coded Amino acids Database) containing information about the intrinsic conformational preferences of non-proteinogenic residues determined by quantum mechanical calculations, as well as bibliographic information about their synthesis, physical and spectroscopic characterization, conformational propensities established experimentally, and applications. The architecture of the database is presented in this work together with the first family of non-coded residues included, namely, α-tetrasubstituted α-amino acids. Furthermore, the NCAD usefulness is demonstrated through a test-case application example. PMID:20455555

  16. Complete genome sequence of Enterococcus mundtii QU 25, an efficient L-(+)-lactic acid-producing bacterium.

    PubMed

    Shiwa, Yuh; Yanase, Hiroaki; Hirose, Yuu; Satomi, Shohei; Araya-Kojima, Tomoko; Watanabe, Satoru; Zendo, Takeshi; Chibazakura, Taku; Shimizu-Kadota, Mariko; Yoshikawa, Hirofumi; Sonomoto, Kenji

    2014-08-01

    Enterococcus mundtii QU 25, a non-dairy bacterial strain of ovine faecal origin, can ferment both cellobiose and xylose to produce l-lactic acid. The use of this strain is highly desirable for economical l-lactate production from renewable biomass substrates. Genome sequence determination is necessary for the genetic improvement of this strain. We report the complete genome sequence of strain QU 25, primarily determined using Pacific Biosciences sequencing technology. The E. mundtii QU 25 genome comprises a 3 022 186-bp single circular chromosome (GC content, 38.6%) and five circular plasmids: pQY182, pQY082, pQY039, pQY024, and pQY003. In all, 2900 protein-coding sequences, 63 tRNA genes, and 6 rRNA operons were predicted in the QU 25 chromosome. Plasmid pQY024 harbours genes for mundticin production. We found that strain QU 25 produces a bacteriocin, suggesting that mundticin-encoded genes on plasmid pQY024 were functional. For lactic acid fermentation, two gene clusters were identified-one involved in the initial metabolism of xylose and uptake of pentose and the second containing genes for the pentose phosphate pathway and uptake of related sugars. This is the first complete genome sequence of an E. mundtii strain. The data provide insights into lactate production in this bacterium and its evolution among enterococci.

  17. Rational design of translational pausing without altering the amino acid sequence dramatically promotes soluble protein expression: a strategic demonstration.

    PubMed

    Chen, Wei; Jin, Jingjie; Gu, Wei; Wei, Bo; Lei, Yun; Xiong, Sheng; Zhang, Gong

    2014-11-10

    The production of many pharmaceutical and industrial proteins in prokaryotic hosts is hindered by the insolubility of industrial expression products resulting from misfolding. Even with a correct primary sequence, an improper translation elongation rate in a heterologous expression system is an important cause of misfolding. In silico analysis revealed that most of the endogenous Escherichia coli genes display translational pausing sites that promote correct folding, and almost 1/5 genes have pausing sites at the 3'-termini of their coding sequence. Therefore, we established a novel strategy to efficiently promote the expression of soluble and active proteins without altering the amino acid sequence or expression conditions. This strategy uses the rational design of translational pausing based on structural information solely through synonymous substitutions, i.e. no change on the amino acids sequence. We demonstrated this strategy on a promising antiviral candidate, Cyanovirin-N (CVN), which could not be efficiently expressed in any previously reported system. By introducing silent mutations, we increased the soluble expression level in E. coli by 2000-fold without altering the CVN protein sequence, and the specific activity was slightly higher for the optimized CVN than for the wild-type variant. This strategy introduces new possibilities for the production of bioactive recombinant proteins.

  18. New genes from non-coding sequence: the role of de novo protein-coding genes in eukaryotic evolutionary innovation

    PubMed Central

    McLysaght, Aoife; Guerzoni, Daniele

    2015-01-01

    The origin of novel protein-coding genes de novo was once considered so improbable as to be impossible. In less than a decade, and especially in the last five years, this view has been overturned by extensive evidence from diverse eukaryotic lineages. There is now evidence that this mechanism has contributed a significant number of genes to genomes of organisms as diverse as Saccharomyces, Drosophila, Plasmodium, Arabidopisis and human. From simple beginnings, these genes have in some instances acquired complex structure, regulated expression and important functional roles. New genes are often thought of as dispensable late additions; however, some recent de novo genes in human can play a role in disease. Rather than an extremely rare occurrence, it is now evident that there is a relatively constant trickle of proto-genes released into the testing ground of natural selection. It is currently unknown whether de novo genes arise primarily through an ‘RNA-first’ or ‘ORF-first’ pathway. Either way, evolutionary tinkering with this pool of genetic potential may have been a significant player in the origins of lineage-specific traits and adaptations. PMID:26323763

  19. Molecular cloning of the goose ACSL3 and ACSL5 coding domain sequences and their expression characteristics during goose fatty liver development.

    PubMed

    He, H; Liu, H H; Wang, J W; Lv, J; Li, L; Pan, Z X

    2014-01-01

    It has been demonstrated that ACSL3 and ACSL5 play important roles in fat metabolism. To investigate the primary functions of ACSL3 and ACSL5 and to evaluate their expression levels during goose fatty liver development, we cloned the ACSL3 and ACSL5 coding domain sequences (CDSs) of geese using RT-PCR and analyzed their expression characteristics under different conditions using qRT-PCR. The results showed that the goose ACSL3 (JX511975) and ACSL5 (JX511976) sequences have high similarities with the chicken sequences both at the nucleotide and amino acid levels. Both ACSL3 and ACSL5 have high expression levels in goose liver. The expression levels of ACSL3 and ACSL5 in goose liver and hepatocytes can be changed by overfeeding geese and by treatment with unsaturated fatty acids, respectively. Together, these results indicate that ACSL3 and ACSL5 play important roles during fatty liver development. The different expression characteristics of goose ACSL3 and ACSL5 suggest that these two genes may be responsible for specific functions.

  20. Detection and isolation of nucleic acid sequences using a bifunctional hybridization probe

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    2000-01-01

    A method for detecting and isolating a target sequence in a sample of nucleic acids is provided using a bifunctional hybridization probe capable of hybridizing to the target sequence that includes a detectable marker and a first complexing agent capable of forming a binding pair with a second complexing agent. A kit is also provided for detecting a target sequence in a sample of nucleic acids using a bifunctional hybridization probe according to this method.

  1. The nucleotide sequence of the gene coding for the elongation factor 1 alpha in Sulfolobus solfataricus. Homology of the product with related proteins.

    PubMed

    Arcari, P; Gallo, M; Ianniciello, G; Dello Russo, A; Bocchini, V

    1994-04-01

    The cloning and sequencing of the gene coding for the archaebacterial elongation factor 1 alpha (aEF-1 alpha) was performed by screening a Sulfolobus solfataricus genomic library using a probe constructed from the eptapeptide KNMITGA that is conserved in all the EF-1 alpha/EF-Tu known so far. The isolated recombinant phage contained the part of the aEF-1 alpha gene from amino acids 1 to 171. The other part (amino acids 162-435) was obtained through the amplification of the S. solfataricus DNA by PCR. The codon usage by the aEF-1 alpha gene showed a preference for triplets ending in A and/or T. This behavior was almost identical to that of the S. acidocaldarius EF-1 alpha gene but differed greatly from that of EF-1 alpha/EF-Tu genes in other archaebacteria eukaryotes and eubacteria. The translated protein is made of 435 amino acid residues and contains sequence motifs for the binding of GTP, tRNA and ribosome. Alignments of aEF-1 alpha with several EF-1 alpha/EF-Tu revealed that aEF-1 alpha is more similar to its eukaryotic than to its eubacterial counterparts. PMID:8148382

  2. Molecular characterization of coding sequences and analysis of Toll-like receptor 3 mRNA expression in water buffalo (Bubalus bubalis) and nilgai (Boselaphus tragocamelus).

    PubMed

    Dhara, Animesh; Saini, Mohini; Das, Dhanjit K; Swarup, Devendra; Sharma, Bhaskar; Kumar, Satish; Gupta, Praveen K

    2007-01-01

    Toll-like receptor 3 (TLR3), an antiviral innate immunity receptor recognizes double-stranded RNA, preferably of viral origin and induces type I interferon production, which causes maturation of phagocytes and subsequent release of chemical mediators from phagocytes against some viral infections. The present study has characterized TLR3 complementary DNA (cDNA) in buffalo (Bubalus bubalis) and nilgai (Boselaphus tragocamelus). TLR3 coding sequences of both buffalo and nilgai were amplified from cultured dendritic cell cDNA and cloned in pGEMT-easy vector for characterization by restriction endonucleases and nucleotide sequencing. Sequence analysis reveals that 2,715-bp-long TLR3 open reading frame encoding 904 amino acids in buffalo as well as nilgai is similar to that of cattle. Buffalo TLR3 has 98.6 and 97.9% identity at nucleotide level with nilgai and cattle, respectively. Likewise, buffalo TLR3 amino acids share 96.7% identity with cattle and 97.8% with nilgai. Non-synonymous substitutions exceeding synonymous substitutions indicate evolution of this receptor through positive selection among these three ruminant species. Buffalo and nilgai appear to have diverged from a common ancestor in phylogenetic analysis. Predicted protein structures of buffalo and nilgai TLR3 from deduced amino acid sequences indicate that the buffalo and nilgai TLR3 ectodomain may be more efficient in ligand binding than that of cattle. Furthermore, TLR3 messenger RNA expression in tissues as quantified by real-time PCR was found higher in nilgai than buffalo.

  3. Amino acid sequence of horseshoe crab, Tachypleus tridentatus, striated muscle troponin C.

    PubMed

    Kobayashi, T; Kagami, O; Takagi, T; Konishi, K

    1989-05-01

    The amino acid sequence of troponin C obtained from horseshoe crab, Tachypleus tridentatus, striated muscle was determined by sequence analysis and alignments of chemically and enzymatically cleaved peptides. Troponin C is composed of 153 amino acid residues with a blocked N-terminus and contains no tryptophan or cysteine residue. The site I, one of the four Ca2+-binding sites, is considered to have lost its ability to bind Ca2+ owing to the replacements of certain amino acid residues.

  4. Analysis of the multi-copied genes and the impact of the redundant protein coding sequences on gene annotation in prokaryotic genomes.

    PubMed

    Yu, Jia-Feng; Chen, Qing-Li; Ren, Jing; Yang, Yan-Ling; Wang, Ji-Hua; Sun, Xiao

    2015-07-01

    The important roles of duplicated genes in evolutional process have been recognized in bacteria, archaebacteria and eukaryotes, while there is very little study on the multi-copied protein coding genes that share sequence identity of 100%. In this paper, the multi-copied protein coding genes in a number of prokaryotic genomes are comprehensively analyzed firstly. The results show that 0-15.93% of the protein coding genes in each genome are multi-copied genes and 0-16.49% of the protein coding genes in each genome are highly similar with the sequence identity ≥ 80%. Function and COG (Clusters of Orthologous Groups of proteins) analysis shows that 64.64% of multi-copied genes concentrate on the function of transposase and 86.28% of the COG assigned multi-copied genes concentrate on the COG code of 'L'. Furthermore, the impact of redundant protein coding sequences on the gene prediction results is studied. The results show that the problem of protein coding sequence redundancies cannot be ignored and the consistency of the gene annotation results before and after excluding the redundant sequences is negatively related with the sequences redundancy degree of the protein coding sequences in the training set.

  5. Long Non-Coding RNA and Alternative Splicing Modulations in Parkinson's Leukocytes Identified by RNA Sequencing

    PubMed Central

    Soreq, Lilach; Guffanti, Alessandro; Salomonis, Nathan; Simchovitz, Alon; Israel, Zvi; Bergman, Hagai; Soreq, Hermona

    2014-01-01

    The continuously prolonged human lifespan is accompanied by increase in neurodegenerative diseases incidence, calling for the development of inexpensive blood-based diagnostics. Analyzing blood cell transcripts by RNA-Seq is a robust means to identify novel biomarkers that rapidly becomes a commonplace. However, there is lack of tools to discover novel exons, junctions and splicing events and to precisely and sensitively assess differential splicing through RNA-Seq data analysis and across RNA-Seq platforms. Here, we present a new and comprehensive computational workflow for whole-transcriptome RNA-Seq analysis, using an updated version of the software AltAnalyze, to identify both known and novel high-confidence alternative splicing events, and to integrate them with both protein-domains and microRNA binding annotations. We applied the novel workflow on RNA-Seq data from Parkinson's disease (PD) patients' leukocytes pre- and post- Deep Brain Stimulation (DBS) treatment and compared to healthy controls. Disease-mediated changes included decreased usage of alternative promoters and N-termini, 5′-end variations and mutually-exclusive exons. The PD regulated FUS and HNRNP A/B included prion-like domains regulated regions. We also present here a workflow to identify and analyze long non-coding RNAs (lncRNAs) via RNA-Seq data. We identified reduced lncRNA expression and selective PD-induced changes in 13 of over 6,000 detected leukocyte lncRNAs, four of which were inversely altered post-DBS. These included the U1 spliceosomal lncRNA and RP11-462G22.1, each entailing sequence complementarity to numerous microRNAs. Analysis of RNA-Seq from PD and unaffected controls brains revealed over 7,000 brain-expressed lncRNAs, of which 3,495 were co-expressed in the leukocytes including U1, which showed both leukocyte and brain increases. Furthermore, qRT-PCR validations confirmed these co-increases in PD leukocytes and two brain regions, the amygdala and substantia

  6. Cortical and subcortical contributions to sequence retrieval: schematic coding of temporal context in the neocortical recollection network

    PubMed Central

    Hsieh, Liang-Tien; Ranganath, Charan

    2015-01-01

    Episodic memory entails the ability to remember what happened when. Although the available evidence indicates that the hippocampus plays a role in structuring serial order information during retrieval of event sequences, information processed in the hippocampus must be conveyed to other cortical and subcortical areas in order to guide behavior. However, the extent to which other brain regions contribute to the temporal organization of episodic memory remains unclear. Here, we examined multivoxel activity pattern changes during retrieval of learned and random object sequences, focusing on a neocortical “core recollection network” that includes the medial prefrontal cortex, retrosplenial cortex, and angular gyrus, as well as on striatal areas including the caudate nucleus and putamen that have been implicated in processing of sequence information. The results demonstrate that regions of the core recollection network carry information about temporal positions within object sequences, irrespective of object information. This schematic coding of temporal information is in contrast to the putamen, which carried information specific to objects in learned sequences, and the caudate, which carried information about objects, irrespective of sequence context. Our results suggest a role for the cortical recollection network in the representation of temporal structure of events during episodic retrieval, and highlight the possible mechanisms by which the striatal areas may contribute to this process. More broadly, the results indicate that temporal sequence retrieval is a useful paradigm for dissecting the contributions of specific brain regions to episodic memory. PMID:26209802

  7. Cortical and subcortical contributions to sequence retrieval: Schematic coding of temporal context in the neocortical recollection network.

    PubMed

    Hsieh, Liang-Tien; Ranganath, Charan

    2015-11-01

    Episodic memory entails the ability to remember what happened when. Although the available evidence indicates that the hippocampus plays a role in structuring serial order information during retrieval of event sequences, information processed in the hippocampus must be conveyed to other cortical and subcortical areas in order to guide behavior. However, the extent to which other brain regions contribute to the temporal organization of episodic memory remains unclear. Here, we examined multivoxel activity pattern changes during retrieval of learned and random object sequences, focusing on a neocortical "core recollection network" that includes the medial prefrontal cortex, retrosplenial cortex, and angular gyrus, as well as on striatal areas including the caudate nucleus and putamen that have been implicated in processing of sequence information. The results demonstrate that regions of the core recollection network carry information about temporal positions within object sequences, irrespective of object information. This schematic coding of temporal information is in contrast to the putamen, which carried information specific to objects in learned sequences, and the caudate, which carried information about objects, irrespective of sequence context. Our results suggest a role for the cortical recollection network in the representation of temporal structure of events during episodic retrieval, and highlight the possible mechanisms by which the striatal areas may contribute to this process. More broadly, the results indicate that temporal sequence retrieval is a useful paradigm for dissecting the contributions of specific brain regions to episodic memory. PMID:26209802

  8. Cortical and subcortical contributions to sequence retrieval: Schematic coding of temporal context in the neocortical recollection network.

    PubMed

    Hsieh, Liang-Tien; Ranganath, Charan

    2015-11-01

    Episodic memory entails the ability to remember what happened when. Although the available evidence indicates that the hippocampus plays a role in structuring serial order information during retrieval of event sequences, information processed in the hippocampus must be conveyed to other cortical and subcortical areas in order to guide behavior. However, the extent to which other brain regions contribute to the temporal organization of episodic memory remains unclear. Here, we examined multivoxel activity pattern changes during retrieval of learned and random object sequences, focusing on a neocortical "core recollection network" that includes the medial prefrontal cortex, retrosplenial cortex, and angular gyrus, as well as on striatal areas including the caudate nucleus and putamen that have been implicated in processing of sequence information. The results demonstrate that regions of the core recollection network carry information about temporal positions within object sequences, irrespective of object information. This schematic coding of temporal information is in contrast to the putamen, which carried information specific to objects in learned sequences, and the caudate, which carried information about objects, irrespective of sequence context. Our results suggest a role for the cortical recollection network in the representation of temporal structure of events during episodic retrieval, and highlight the possible mechanisms by which the striatal areas may contribute to this process. More broadly, the results indicate that temporal sequence retrieval is a useful paradigm for dissecting the contributions of specific brain regions to episodic memory.

  9. Conserved sequences in both coding and 5' flanking regions of mammalian opal suppressor tRNA genes.

    PubMed Central

    Pratt, K; Eden, F C; You, K H; O'Neill, V A; Hatfield, D

    1985-01-01

    The rabbit genome encodes an opal suppressor tRNA gene. The coding region is strictly conserved between the rabbit gene and the corresponding gene in the human genome. The rabbit opal suppressor gene contains the consensus sequence in the 3' internal control region but like the human and chicken genes, the rabbit 5' internal control region contains two additional nucleotides. The 5' flanking sequences of the rabbit and the human opal suppressor genes contain extensive regions of homology. A subset of these homologies is also present 5' to the chicken opal suppressor gene. Both the rabbit and the human genomes also encode a pseudogene. That of the rabbit lacks the 3' half of the coding region. Neither pseudogene has homologous regions to the 5' flanking regions of the genes. The presence of 5' homologies flanking only the transcribed genes and not the pseudogenes suggests that these regions may be regulatory control elements specifically involved in the expression of the eukaryotic opal suppressor gene. Moreover the strict conservation of coding sequences indicates functional importance for the opal suppressor tRNA genes. Images PMID:4022772

  10. 5'-coding sequence of the nasA gene of Azotobacter vinelandii is required for efficient expression.

    PubMed

    Wang, Baomin; Wang, Yumei; Kennedy, Christina

    2014-10-01

    The operon nasACBH in Azotobacter vinelandii encodes nitrate and nitrite reductases that sequentially reduce nitrate to nitrite and to ammonium for nitrogen assimilation into organic molecules. Our previous analyses showed that nasACBH expression is subject to antitermination regulation that occurs upstream of the nasA gene in response to the availability of nitrate and nitrite. In this study, we continued expression analyses of the nasA gene and observed that the nasA 5'-coding sequence plays an important role in gene expression, as demonstrated by the fact that deletions caused over sixfold reduction in the expression of the lacZ reporter gene. Further analysis suggests that the nasA 5'-coding sequence promotes gene expression in a way that is not associated with weakened transcript folding around the translational initiation region or codon usage bias. The findings from this study imply that there exists potential to improve gene expression in A. vinelandii by optimizing 5'-coding sequences.

  11. Identification, characterization, and complete amino acid sequence of the conjugation-inducing glycoprotein (blepharmone) in the ciliate Blepharisma japonicum

    PubMed Central

    Sugiura, Mayumi; Harumoto, Terue

    2001-01-01

    Conjugation in Blepharisma japonicum is induced by interaction between complementary mating-types I and II, which excrete blepharmone (gamone 1) and blepharismone (gamone 2), respectively. Gamone 1 transforms type II cells such that they can unite, and gamone 2 similarly transforms type I cells. Moreover, each gamone promotes the production of the other gamone. Gamone 2 has been identified as calcium-3-(2′-formylamino-5′-hydroxy-benzoyl) lactate and has been synthesized chemically. Gamone 1 was isolated and characterized as a glycoprotein of 20–30 kDa containing 175 amino acids and 6 sugars. However, the amino acid sequence and arrangement of sugars in this gamone are still unknown. To determine partial amino acid sequences of gamone 1, we established a method of isolation based on the finding that this glycoprotein can be concentrated by a Con A affinity column. Gamone 1 is extremely unstable and loses its biological activity once adsorbed to any of the columns that we tested. By using a Con A affinity column and native PAGE, we detected a 30-kDa protein corresponding to gamone 1 activity and determined the partial amino acid sequences of the four peptides. To isolate gamone 1 cDNA, we isolated mRNA from mating-type I cells stimulated by synthetic gamone 2 and then performed rapid amplification of cDNA ends procedures by using gene-specific primers and cloned cDNA of gamone 1. The cDNA sequence contains an ORF of 305 amino acids and codes a possibly novel protein. We also estimated the arrangement of sugars by comparing the affinity to various lectin columns. PMID:11724922

  12. Identification, characterization, and complete amino acid sequence of the conjugation-inducing glycoprotein (blepharmone) in the ciliate Blepharisma japonicum.

    PubMed

    Sugiura, M; Harumoto, T

    2001-12-01

    Conjugation in Blepharisma japonicum is induced by interaction between complementary mating-types I and II, which excrete blepharmone (gamone 1) and blepharismone (gamone 2), respectively. Gamone 1 transforms type II cells such that they can unite, and gamone 2 similarly transforms type I cells. Moreover, each gamone promotes the production of the other gamone. Gamone 2 has been identified as calcium-3-(2'-formylamino-5'-hydroxy-benzoyl) lactate and has been synthesized chemically. Gamone 1 was isolated and characterized as a glycoprotein of 20-30 kDa containing 175 amino acids and 6 sugars. However, the amino acid sequence and arrangement of sugars in this gamone are still unknown. To determine partial amino acid sequences of gamone 1, we established a method of isolation based on the finding that this glycoprotein can be concentrated by a Con A affinity column. Gamone 1 is extremely unstable and loses its biological activity once adsorbed to any of the columns that we tested. By using a Con A affinity column and native PAGE, we detected a 30-kDa protein corresponding to gamone 1 activity and determined the partial amino acid sequences of the four peptides. To isolate gamone 1 cDNA, we isolated mRNA from mating-type I cells stimulated by synthetic gamone 2 and then performed rapid amplification of cDNA ends procedures by using gene-specific primers and cloned cDNA of gamone 1. The cDNA sequence contains an ORF of 305 amino acids and codes a possibly novel protein. We also estimated the arrangement of sugars by comparing the affinity to various lectin columns.

  13. Automatic identification of large collections of protein-coding or rRNA sequences.

    PubMed

    Arigon, Anne-Muriel; Perrière, Guy; Gouy, Manolo

    2008-04-01

    The number of available genomic sequences is growing very fast, due to the development of massive sequencing techniques. Sequence identification is needed and contributes to the assessment of gene and species evolutionary relationships. Automated bioinformatics tools are thus necessary to carry out these identification operations in an accurate and fast way. We developed HoSeqI (Homologous Sequence Identification), a software environment allowing this kind of automated sequence identification using homologous gene family databases. HoSeqI is accessible through a Web interface (http://pbil.univ-lyon1.fr/software/HoSeqI/) allowing to identify one or several sequences and to visualize resulting alignments and phylogenetic trees. We also implemented another application, MultiHoSeqI, to quickly add a large set of sequences to a family database in order to identify them, to update the database, or to help automatic genome annotation. Lately, we developed an application, ChiSeqI (Chimeric Sequence Identification), to automate the processes of identification of bacterial 16S ribosomal RNA sequences and of detection of chimeric sequences.

  14. Trichomonas vaginalis acidic phospholipase A2: isolation and partial amino acid sequence.

    PubMed

    Escobedo-Guajardo, Brenda L; González-Salazar, Francisco; Palacios-Corona, Rebeca; Torres de la Cruz, Víctor M; Morales-Vallarta, Mario; Mata-Cárdenas, Benito D; Garza-González, Jesús N; Rivera-Silva, Gerardo; Vargas-Villarreal, Javier

    2013-12-01

    Sexually transmitted diseases are a major cause of acute disease worldwide, and trichomoniasis is the most common and curable disease, generating more than 170 million cases annually worldwide. Trichomonas vaginalis is the causal agent of trichomoniasis and has the ability to destroy in vitro cell monolayers of the vaginal mucosa, where the phospholipases A2 (PLA2) have been reported as potential virulence factors. These enzymes have been partially characterized from the subcellular fraction S30 of pathogenic T. vaginalis strains. The main objective of this study was to purify a phospholipase A2 from T. vaginalis, make a partial characterization, obtain a partial amino acid sequence, and determine its enzymatic participation as hemolytic factor causing lysis of erythrocytes. Trichomonas S30, RF30 and UFF30 sub-fractions from GT-15 strain have the capacity to hydrolyze [2-(14)C-PA]-PC at pH 6.0. Proteins from the UFF30 sub-fraction were separated by affinity chromatography into two eluted fractions with detectable PLA A2 activity. The EDTA-eluted fraction was analyzed by HPLC using on-line HPLC-tandem mass spectrometry and two protein peaks were observed at 8.2 and 13 kDa. Peptide sequences were identified from the proteins present in the eluted EDTA UFF30 fraction; bioinformatic analysis using Protein Link Global Server charged with T. vaginalis protein database suggests that eluted peptides correspond a putative ubiquitin protein in the 8.2 kDa fraction and a phospholipase preserved in the 13 kDa fraction. The EDTA-eluted fraction hydrolyzed [2-(14)C-PA]-PC lyses erythrocytes from Sprague-Dawley in a time and dose-dependent manner. The acidic hemolytic activity decreased by 84% with the addition of 100 μM of Rosenthal's inhibitor. PMID:24338313

  15. Trichomonas vaginalis acidic phospholipase A2: isolation and partial amino acid sequence.

    PubMed

    Escobedo-Guajardo, Brenda L; González-Salazar, Francisco; Palacios-Corona, Rebeca; Torres de la Cruz, Víctor M; Morales-Vallarta, Mario; Mata-Cárdenas, Benito D; Garza-González, Jesús N; Rivera-Silva, Gerardo; Vargas-Villarreal, Javier

    2013-12-01

    Sexually transmitted diseases are a major cause of acute disease worldwide, and trichomoniasis is the most common and curable disease, generating more than 170 million cases annually worldwide. Trichomonas vaginalis is the causal agent of trichomoniasis and has the ability to destroy in vitro cell monolayers of the vaginal mucosa, where the phospholipases A2 (PLA2) have been reported as potential virulence factors. These enzymes have been partially characterized from the subcellular fraction S30 of pathogenic T. vaginalis strains. The main objective of this study was to purify a phospholipase A2 from T. vaginalis, make a partial characterization, obtain a partial amino acid sequence, and determine its enzymatic participation as hemolytic factor causing lysis of erythrocytes. Trichomonas S30, RF30 and UFF30 sub-fractions from GT-15 strain have the capacity to hydrolyze [2-(14)C-PA]-PC at pH 6.0. Proteins from the UFF30 sub-fraction were separated by affinity chromatography into two eluted fractions with detectable PLA A2 activity. The EDTA-eluted fraction was analyzed by HPLC using on-line HPLC-tandem mass spectrometry and two protein peaks were observed at 8.2 and 13 kDa. Peptide sequences were identified from the proteins present in the eluted EDTA UFF30 fraction; bioinformatic analysis using Protein Link Global Server charged with T. vaginalis protein database suggests that eluted peptides correspond a putative ubiquitin protein in the 8.2 kDa fraction and a phospholipase preserved in the 13 kDa fraction. The EDTA-eluted fraction hydrolyzed [2-(14)C-PA]-PC lyses erythrocytes from Sprague-Dawley in a time and dose-dependent manner. The acidic hemolytic activity decreased by 84% with the addition of 100 μM of Rosenthal's inhibitor.

  16. tax and rex Sequences of bovine leukaemia virus from globally diverse isolates: rex amino acid sequence more variable than tax.

    PubMed

    McGirr, K M; Buehring, G C

    2005-02-01

    Bovine leukaemia virus (BLV) is an important agricultural problem with high costs to the dairy industry. Here, we examine the variation of the tax and rex genes of BLV. The tax and rex genes share 420 bases and have overlapping reading frames. The tax gene encodes a protein that functions as a transactivator of the BLV promoter, is required for viral replication, acts on cellular promoters, and is responsible for oncogenesis. The rex facilitates the export of viral mRNAs from the nucleus and regulates transcription. We have sequenced five new isolates of the tax/rex gene. We examined the five new and three previously published tax/rex DNA and predicted amino acid sequences of BLV isolates from cattle in representative regions worldwide. The highest variation among nucleic acid sequences for tax and rex was 7% and 5%, respectively; among predicted amino acid sequences for Tax and Rex, 9% and 11%, respectively. Significantly more nucleotide changes resulted in predicted amino acid changes in the rex gene than in the tax gene (P < or = 0.0006). This variability is higher than previously reported for any region of the viral genome. This research may also have implications for the development of Tax-based vaccines. PMID:15702995

  17. A nucleic acid sequence-based amplification system for detection of Listeria monocytogenes hlyA sequences.

    PubMed Central

    Blais, B W; Turner, G; Sooknanan, R; Malek, L T

    1997-01-01

    A nucleic acid sequence-based amplification system primarily targeting mRNA from the Listeria monocytogenes hlyA gene was developed. This system enabled the detection of low numbers (< 10 CFU/g) of L. monocytogenes cells inoculated into a variety of dairy and egg products after 48 h of enrichment in modified listeria enrichment broth. PMID:8979357

  18. Identification of random nucleic acid sequence aberrations using dual capture probes which hybridize to different chromosome regions

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    1998-01-01

    A method is provided for detecting nucleic acid sequence aberrations using two immobilization steps. According to the method, a nucleic acid sequence aberration is detected by detecting nucleic acid sequences having both a first nucleic acid sequence type (e.g., from a first chromosome) and a second nucleic acid sequence type (e.g., from a second chromosome), the presence of the first and the second nucleic acid sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. In the method, immobilization of a first hybridization probe is used to isolate a first set of nucleic acids in the sample which contain the first nucleic acid sequence type. Immobilization of a second hybridization probe is then used to isolate a second set of nucleic acids from within the first set of nucleic acids which contain the second nucleic acid sequence type. The second set of nucleic acids are then detected, their presence indicating the presence of a nucleic acid sequence aberration.

  19. Identification of random nucleic acid sequence aberrations using dual capture probes which hybridize to different chromosome regions

    DOEpatents

    Lucas, J.N.; Straume, T.; Bogen, K.T.

    1998-03-24

    A method is provided for detecting nucleic acid sequence aberrations using two immobilization steps. According to the method, a nucleic acid sequence aberration is detected by detecting nucleic acid sequences having both a first nucleic acid sequence type (e.g., from a first chromosome) and a second nucleic acid sequence type (e.g., from a second chromosome), the presence of the first and the second nucleic acid sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. In the method, immobilization of a first hybridization probe is used to isolate a first set of nucleic acids in the sample which contain the first nucleic acid sequence type. Immobilization of a second hybridization probe is then used to isolate a second set of nucleic acids from within the first set of nucleic acids which contain the second nucleic acid sequence type. The second set of nucleic acids are then detected, their presence indicating the presence of a nucleic acid sequence aberration. 14 figs.

  20. The amino acid sequence of elephant (Elephas maximus) myoglobin and the phylogeny of Proboscidea.

    PubMed

    Dene, H; Goodman, M; Romero-Herrera, A E

    1980-02-13

    The complete amino acid sequence of skeletal myoglobin from the Asian elephant (Elephas maximus) is reported. The functional significance of variations seen when this sequence is compared with that of sperm whale myoglobin is explored in the light of the crystallographic model available for the latter molecule. The phylogenetic implications of the elephant myoglobin amino acid sequence are evaluated by using the maximum parsimony technique. A similar analysis is also presented which incorporates all of the proteins sequenced from the elephant. These results are discussed with respect to current views on proboscidean phylogeny.

  1. Identification of protein-coding sequences using the hybridization of 18S rRNA and mRNA during translation.

    PubMed

    Xing, Chuanhua; Bitzer, Donald L; Alexander, Winser E; Vouk, Mladen A; Stomp, Anne-Marie

    2009-02-01

    We introduce a new approach in this article to distinguish protein-coding sequences from non-coding sequences utilizing a period-3, free energy signal that arises from the interactions of the 3'-terminal nucleotides of the 18S rRNA with mRNA. We extracted the special features of the amplitude and the phase of the period-3 signal in protein-coding regions, which is not found in non-coding regions, and used them to distinguish protein-coding sequences from non-coding sequences. We tested on all the experimental genes from Saccharomyces cerevisiae and Schizosaccharomyces pombe. The identification was consistent with the corresponding information from GenBank, and produced better performance compared to existing methods that use a period-3 signal. The primary tests on some fly, mouse and human genes suggests that our method is applicable to higher eukaryotic genes. The tests on pseudogenes indicated that most pseudogenes have no period-3 signal. Some exploration of the 3'-tail of 18S rRNA and pattern analysis of protein-coding sequences supported further our assumption that the 3'-tail of 18S rRNA has a role of synchronization throughout translation elongation process. This, in turn, can be utilized for the identification of protein-coding sequences.

  2. In silico mining of microsatellites in coding sequences of the date palm (Arecaceae) genome, characterization, and transferability1

    PubMed Central

    Aberlenc-Bertossi, Frédérique; Castillo, Karina; Tranchant-Dubreuil, Christine; Chérif, Emira; Ballardini, Marco; Abdoulkader, Sabira; Gros-Balthazard, Muriel; Chabrillange, Nathalie; Santoni, Sylvain; Mercuri, Antonio; Pintaud, Jean-Christophe

    2014-01-01

    • Premise of the study: To complement existing sets of primarily dinucleotide microsatellite loci from noncoding sequences of date palm, we developed primers for tri- and hexanucleotide microsatellite loci identified within genes. Due to their conserved genomic locations, the primers should be useful in other palm taxa, and their utility was tested in seven other Phoenix species and in Chamaerops, Livistona, and Hyphaene. • Methods and Results: Tandem repeat motifs of 3–6 bp were searched using a simple sequence repeat (SSR)–pipeline package in coding portions of the date palm draft genome sequence. Fifteen loci produced highly consistent amplification, intraspecific polymorphisms, and stepwise mutation patterns. • Conclusions: These microsatellite loci showed sufficient levels of variability and transferability to make them useful for population genetic, selection signature, and interspecific gene flow studies in Phoenix and other Coryphoideae genera. PMID:25202594

  3. Differential effects of high-temperature stress on nuclear topology and transcription of repetitive noncoding and coding rye sequences.

    PubMed

    Tomás, D; Brazão, J; Viegas, W; Silva, M

    2013-01-01

    The plant stress response has been extensively characterized at the biochemical and physiological levels. However, knowledge concerning repetitive sequence genome fraction modulation during extreme temperature conditions is scarce. We studied high-temperature effects on subtelomeric repetitive sequences (pSc200) and 45S rDNA in rye seedlings submitted to 40°C during 4 h. Chromatin organization patterns were evaluated through fluorescent in situ hybridization and transcription levels were assessed using quantitative real-time PCR. Additionally, the nucleolar dynamics were evaluated through fibrillarin immunodetection in interphase nuclei. The results obtained clearly demonstrated that the pSc200 sequence organization is not affected by high-temperature stress (HTS) and proved for the first time that this noncoding subtelomeric sequence is stably transcribed. Conversely, it was demonstrated that HTS treatment induces marked rDNA chromatin decondensation along with nucleolar enlargement and a significant increase in ribosomal gene transcription. The role of noncoding and coding repetitive rye sequences in the plant stress response that are suggested by their clearly distinct behaviors is discussed. While the heterochromatic conformation of pSc200 sequences seems to be involved in the stabilization of the interphase chromatin architecture under stress conditions, the dynamic modulation of nucleolar and rDNA topology and transcription suggest their role in plant stress response pathways.

  4. An ultra-sparse code underliesthe generation of neural sequences in a songbird

    NASA Astrophysics Data System (ADS)

    Hahnloser, Richard H. R.; Kozhevnikov, Alexay A.; Fee, Michale S.

    2002-09-01

    Sequences of motor activity are encoded in many vertebrate brains by complex spatio-temporal patterns of neural activity; however, the neural circuit mechanisms underlying the generation of these pre-motor patterns are poorly understood. In songbirds, one prominent site of pre-motor activity is the forebrain robust nucleus of the archistriatum (RA), which generates stereotyped sequences of spike bursts during song and recapitulates these sequences during sleep. We show that the stereotyped sequences in RA are driven from nucleus HVC (high vocal centre), the principal pre-motor input to RA. Recordings of identified HVC neurons in sleeping and singing birds show that individual HVC neurons projecting onto RA neurons produce bursts sparsely, at a single, precise time during the RA sequence. These HVC neurons burst sequentially with respect to one another. We suggest that at each time in the RA sequence, the ensemble of active RA neurons is driven by a subpopulation of RA-projecting HVC neurons that is active only at that time. As a population, these HVC neurons may form an explicit representation of time in the sequence. Such a sparse representation, a temporal analogue of the `grandmother cell' concept for object recognition, eliminates the problem of temporal interference during sequence generation and learning attributed to more distributed representations.

  5. Resequencing of the common marmoset genome improves genome assemblies and gene-coding sequence analysis

    PubMed Central

    Sato, Kengo; Kuroki, Yoko; Kumita, Wakako; Fujiyama, Asao; Toyoda, Atsushi; Kawai, Jun; Iriki, Atsushi; Sasaki, Erika; Okano, Hideyuki; Sakakibara, Yasubumi

    2015-01-01

    The first draft of the common marmoset (Callithrix jacchus) genome was published by the Marmoset Genome Sequencing and Analysis Consortium. The draft was based on whole-genome shotgun sequencing, and the current assembly version is Callithrix_jacches-3.2.1, but there still exist 187,214 undetermined gap regions and supercontigs and relatively short contigs that are unmapped to chromosomes in the draft genome. We performed resequencing and assembly of the genome of common marmoset by deep sequencing with high-throughput sequencing technology. Several different sequence runs using Illumina sequencing platforms were executed, and 181 Gbp of high-quality bases including mate-pairs with long insert lengths of 3, 8, 20, and 40 Kbp were obtained, that is, approximately 60× coverage. The resequencing significantly improved the MGSAC draft genome sequence. The N50 of the contigs, which is a statistical measure used to evaluate assembly quality, doubled. As a result, 51% of the contigs (total length: 299 Mbp) that were unmapped to chromosomes in the MGSAC draft were merged with chromosomal contigs, and the improved genome sequence helped to detect 5,288 new genes that are homologous to human cDNAs and the gaps in 5,187 transcripts of the Ensembl gene annotations were completely filled. PMID:26586576

  6. Resequencing of the common marmoset genome improves genome assemblies and gene-coding sequence analysis.

    PubMed

    Sato, Kengo; Kuroki, Yoko; Kumita, Wakako; Fujiyama, Asao; Toyoda, Atsushi; Kawai, Jun; Iriki, Atsushi; Sasaki, Erika; Okano, Hideyuki; Sakakibara, Yasubumi

    2015-11-20

    The first draft of the common marmoset (Callithrix jacchus) genome was published by the Marmoset Genome Sequencing and Analysis Consortium. The draft was based on whole-genome shotgun sequencing, and the current assembly version is Callithrix_jacches-3.2.1, but there still exist 187,214 undetermined gap regions and supercontigs and relatively short contigs that are unmapped to chromosomes in the draft genome. We performed resequencing and assembly of the genome of common marmoset by deep sequencing with high-throughput sequencing technology. Several different sequence runs using Illumina sequencing platforms were executed, and 181 Gbp of high-quality bases including mate-pairs with long insert lengths of 3, 8, 20, and 40 Kbp were obtained, that is, approximately 60× coverage. The resequencing significantly improved the MGSAC draft genome sequence. The N50 of the contigs, which is a statistical measure used to evaluate assembly quality, doubled. As a result, 51% of the contigs (total length: 299 Mbp) that were unmapped to chromosomes in the MGSAC draft were merged with chromosomal contigs, and the improved genome sequence helped to detect 5,288 new genes that are homologous to human cDNAs and the gaps in 5,187 transcripts of the Ensembl gene annotations were completely filled.

  7. Facile Analysis and Sequencing of Linear and Branched Peptide Boronic Acids by MALDI Mass Spectrometry

    PubMed Central

    Crumpton, Jason; Zhang, Wenyu; Santos, Webster

    2011-01-01

    Interest in peptides incorporating boronic acid moieties is increasing due to their potential as therapeutics/diagnostics for a variety of diseases such as cancer. The utility of peptide boronic acids may be expanded with access to vast libraries that can be deconvoluted rapidly and economically. Unfortunately, current detection protocols using mass spectrometry are laborious and confounded by boronic acid trimerization, which requires time consuming analysis of dehydration products. These issues are exacerbated when the peptide sequence is unknown, as with de novo sequencing, and especially when multiple boronic acid moieties are present. Thus, a rapid, reliable and simple method for peptide identification is of utmost importance. Herein, we report the identification and sequencing of linear and branched peptide boronic acids containing up to five boronic acid groups by matrix-assisted laser desorption/ionization mass spectrometry (MALDI-MS). Protocols for preparation of pinacol boronic esters were adapted for efficient MALDI analysis of peptides. Additionally, a novel peptide boronic acid detection strategy was developed in which 2,5-dihydroxybenzoic acid (DHB) served as both matrix and derivatizing agent in a convenient, in situ, on-plate esterification. Finally, we demonstrate that DHB-modified peptide boronic acids from a single bead can be analyzed by MALDI-MSMS analysis, validating our approach for the identification and sequencing of branched peptide boronic acid libraries. PMID:21449540

  8. Evolution of an Enzyme from a Noncatalytic Nucleic Acid Sequence.

    PubMed

    Gysbers, Rachel; Tram, Kha; Gu, Jimmy; Li, Yingfu

    2015-01-01

    The mechanism by which enzymes arose from both abiotic and biological worlds remains an unsolved natural mystery. We postulate that an enzyme can emerge from any sequence of any functional polymer under permissive evolutionary conditions. To support this premise, we have arbitrarily chosen a 50-nucleotide DNA fragment encoding for the Bos taurus (cattle) albumin mRNA and subjected it to test-tube evolution to derive a catalytic DNA (DNAzyme) with RNA-cleavage activity. After only a few weeks, a DNAzyme with significant catalytic activity has surfaced. Sequence comparison reveals that seven nucleotides are responsible for the conversion of the noncatalytic sequence into the enzyme. Deep sequencing analysis of DNA pools along the evolution trajectory has identified individual mutations as the progressive drivers of the molecular evolution. Our findings demonstrate that an enzyme can indeed arise from a sequence of a functional polymer via permissive molecular evolution, a mechanism that may have been exploited by nature for the creation of the enormous repertoire of enzymes in the biological world today. PMID:26091540

  9. Genome defense against exogenous nucleic acids in eukaryotes by non-coding DNA occurs through CRISPR-like mechanisms in the cytosol and the bodyguard protection in the nucleus.

    PubMed

    Qiu, Guo-Hua

    2016-01-01

    In this review, the protective function of the abundant non-coding DNA in the eukaryotic genome is discussed from the perspective of genome defense against exogenous nucleic acids. Peripheral non-coding DNA has been proposed to act as a bodyguard that protects the genome and the central protein-coding sequences from ionizing radiation-induced DNA damage. In the proposed mechanism of protection, the radicals generated by water radiolysis in the cytosol and IR energy are absorbed, blocked and/or reduced by peripheral heterochromatin; then, the DNA damage sites in the heterochromatin are removed and expelled from the nucleus to the cytoplasm through nuclear pore complexes, most likely through the formation of extrachromosomal circular DNA. To strengthen this hypothesis, this review summarizes the experimental evidence supporting the protective function of non-coding DNA against exogenous nucleic acids. Based on these data, I hypothesize herein about the presence of an additional line of defense formed by small RNAs in the cytosol in addition to their bodyguard protection mechanism in the nucleus. Therefore, exogenous nucleic acids may be initially inactivated in the cytosol by small RNAs generated from non-coding DNA via mechanisms similar to the prokaryotic CRISPR-Cas system. Exogenous nucleic acids may enter the nucleus, where some are absorbed and/or blocked by heterochromatin and others integrate into chromosomes. The integrated fragments and the sites of DNA damage are removed by repetitive non-coding DNA elements in the heterochromatin and excluded from the nucleus. Therefore, the normal eukaryotic genome and the central protein-coding sequences are triply protected by non-coding DNA against invasion by exogenous nucleic acids. This review provides evidence supporting the protective role of non-coding DNA in genome defense.

  10. Cloning and sequence analysis of cDNA coding for a lectin from Helianthus tuberosus callus and its jasmonate-induced expression.

    PubMed

    Nakagawa, R; Yasokawa, D; Okumura, Y; Nagashima, K

    2000-06-01

    Two lectins (designated as HTA I and HTA II) that seemed to be isolectins were found in Helianthus tuberosus callus. cDNA encoding HTA I was isolated from a ZAP Express expression library by immunoselection by using the anti-HTA antiserum. The sequence of this cDNA consisted of 432 bp nucleotides coding for a polypeptide of 143 amino acid residues (Mr, 15,314). When introduced into E. coli, the cDNA directed the synthesis of active HTA I as indicated by the hemagglutination activity. The deduced amino acid sequence showed homology with some lectins and jasmonate-induced proteins. When callus was cultured in the presence of methyl jasmonate (MeJA), the hemagglutination activity increased in a dose-dependent manner. The levels of expression of the HTA protein and of the corresponding mRNA also increased in the treated callus. In view of these results, HTA I is considered to be a jasmonate-induced protein. PMID:10923797

  11. Revealing the amino acid composition of proteins within an expanded genetic code

    PubMed Central

    Aerni, Hans R.; Shifman, Mark A.; Rogulina, Svetlana; O'Donoghue, Patrick; Rinehart, Jesse

    2015-01-01

    The genetic code can be manipulated to reassign codons for the incorporation of non-standard amino acids (NSAA). Deletion of release factor 1 in Escherichia coli enhances translation of UAG (Stop) codons, yet may also extended protein synthesis at natural UAG terminated messenger RNAs. The fidelity of protein synthesis at reassigned UAG codons and the purity of the NSAA containing proteins produced require careful examination. Proteomics would be an ideal tool for these tasks, but conventional proteomic analyses cannot readily identify the extended proteins and accurately discover multiple amino acid (AA) insertions at a single UAG. To address these challenges, we created a new proteomic workflow that enabled the detection of UAG readthrough in native proteins in E. coli strains in which UAG was reassigned to encode phosphoserine. The method also enabled quantitation of NSAA and natural AA incorporation at UAG in a recombinant reporter protein. As a proof-of-principle, we measured the fidelity and purity of the phosphoserine orthogonal translation system (OTS) and used this information to improve its performance. Our results show a surprising diversity of natural AAs at reassigned stop codons. Our method can be used to improve OTSs and to quantify amino acid purity at reassigned codons in organisms with expanded genetic codes. PMID:25378305

  12. Computer Simulation of the Determination of Amino Acid Sequences in Polypeptides

    ERIC Educational Resources Information Center

    Daubert, Stephen D.; Sontum, Stephen F.

    1977-01-01

    Describes a computer program that generates a random string of amino acids and guides the student in determining the correct sequence of a given protein by using experimental analytic data for that protein. (MLH)

  13. Targeted capture of homoeologous coding and noncoding sequence in polyploid cotton.

    PubMed

    Salmon, Armel; Udall, Joshua A; Jeddeloh, Jeffrey A; Wendel, Jonathan

    2012-08-01

    Targeted sequence capture is a promising technology in many areas in biology. These methods enable efficient and relatively inexpensive sequencing of hundreds to thousands of genes or genomic regions from many more individuals than is practical using whole-genome sequencing approaches. Here, we demonstrate the feasibility of target enrichment using sequence capture in polyploid cotton. To capture and sequence both members of each gene pair (homeologs) of wild and domesticated Gossypium hirsutum, we created custom hybridization probes to target 1000 genes (500 pairs of homeologs) using information from the cotton transcriptome. Two widely divergent samples of G. hirsutum were hybridized to four custom NimbleGen capture arrays containing probes for targeted genes. We show that the two coresident homeologs in the allopolyploid nucleus were efficiently captured with high coverage. The capture efficiency was similar between the two accessions and independent of whether the samples were multiplexed. A significant amount of flanking, nontargeted sequence (untranslated regions and introns) was also captured and sequenced along with the targeted exons. Intraindividual heterozygosity is low in both wild and cultivated Upland cotton, as expected from the high level of inbreeding in natural G. hirsutum and bottlenecks accompanying domestication. In addition, levels of heterozygosity appeared asymmetrical with respect to genome (A(T) or D(T)) in cultivated cotton. The approach used here is general, scalable, and may be adapted for many different research inquiries involving polyploid plant genomes. PMID:22908041

  14. The sequence of sequencers: The history of sequencing DNA.

    PubMed

    Heather, James M; Chain, Benjamin

    2016-01-01

    Determining the order of nucleic acid residues in biological samples is an integral component of a wide variety of research applications. Over the last fifty years large numbers of researchers have applied themselves to the production of techniques and technologies to facilitate this feat, sequencing DNA and RNA molecules. This time-scale has witnessed tremendous changes, moving from sequencing short oligonucleotides to millions of bases, from struggling towards the deduction of the coding sequence of a single gene to rapid and widely available whole genome sequencing. This article traverses those years, iterating through the different generations of sequencing technology, highlighting some of the key discoveries, researchers, and sequences along the way.

  15. Next-Generation Sequencing of Protein-Coding and Long Non-protein-Coding RNAs in Two Types of Exosomes Derived from Human Whole Saliva.

    PubMed

    Ogawa, Yuko; Tsujimoto, Masafumi; Yanoshita, Ryohei

    2016-01-01

    Exosomes are small extracellular vesicles containing microRNAs and mRNAs that are produced by various types of cells. We previously used ultrafiltration and size-exclusion chromatography to isolate two types of human salivary exosomes (exosomes I, II) that are different in size and proteomes. We showed that salivary exosomes contain large repertoires of small RNAs. However, precise information regarding long RNAs in salivary exosomes has not been fully determined. In this study, we investigated the compositions of protein-coding RNAs (pcRNAs) and long non-protein-coding RNAs (lncRNAs) of exosome I, exosome II and whole saliva (WS) by next-generation sequencing technology. Although 11% of all RNAs were commonly detected among the three samples, the compositions of reads mapping to known RNAs were similar. The most abundant pcRNA is ribosomal RNA protein, and pcRNAs of some salivary proteins such as S100 calcium-binding protein A8 (protein S100-A8) were present in salivary exosomes. Interestingly, lncRNAs of pseudogenes (presumably, processed pseudogenes) were abundant in exosome I, exosome II and WS. Translationally controlled tumor protein gene, which plays an important role in cell proliferation, cell death and immune responses, was highly expressed as pcRNA and pseudogenes in salivary exosomes. Our results show that salivary exosomes contain various types of RNAs such as pseudogenes and small RNAs, and may mediate intercellular communication by transferring these RNAs to target cells as gene expression regulators. PMID:27582331

  16. Amino Acid Coding Bias of the Hypersaline Dead Sea on an Environmental Scale

    NASA Astrophysics Data System (ADS)

    Rhodes, M. E.; Fitz-Gibbon, S.; Bodaker, I.; Beja, O.; Oren, A.; House, C.

    2008-12-01

    Metagenomic approaches can offer a broad overview of the microbial diversity in and environment and the metabolic processes performed within. At the most general level, knowing merely the GC content of an environment is enough to yield valuable insights as to the makeup of a microbial community. It has been documented that various environmental stresses, such as extreme acidity or salinity, can alter the usage of amino acids within members of an ecosystem. Here we explore the proportion of amino acids encoded within a variety of metagenomes including microbiomes from the human gut, the deep sea subsurface, acid mines, and the Dead Sea. Our primary focus is on strategies employed by hyperhalophiles to cope with the multimolar salinities of their environments. One of the approaches, used by archaea of the order Halobacteriales , as well as by a limited number of halophilc Bacteria is to accumulate comparable salt concentrations within their cytoplasm. It has been shown within individual species that the cytoplasmic proteins must then be modified in order to maintain their functionality. The changes include an overall increase in acidic amino acids coupled to a decrease in basic amino acids and a decrease in hydrophobic amino acids compensated for by an increase in the borderline hydrophobic amino acids Ser and Thr. We observed these trends within all fully sequenced hyperhalophilic Archaea and two distinct Dead Sea metagenomes (1992 and 2007). Additonally, the ratio of acidic to basic amino acids in the Dead Sea increased between the years 1992 and 2007, from 1.55 to 1.83. This corresponds to an increase of salinity of approximately 30 percent (from 270 ppt to 350 ppt) over the same time period. The shift in ratio of acidic to basic amino acids was not just observable in the metagenome as a whole and the archaeal subpopulation but was also pronounced in the bacterial subpopulation, from 1.27 to 1.62. This shift seems to indicate a restriction of the community from a

  17. Functional Divergence of APETALA1 and FRUITFULL is due to Changes in both Regulation and Coding Sequence

    PubMed Central

    McCarthy, Elizabeth W.; Mohamed, Abeer; Litt, Amy

    2015-01-01

    Gene duplications are prevalent in plants, and functional divergence subsequent to duplication may be linked with the occurrence of novel phenotypes in plant evolution. Here, we examine the functional divergence of Arabidopsis thaliana APETALA1 (AP1) and FRUITFULL (FUL), which arose via a duplication correlated with the origin of the core eudicots. Both AP1 and FUL play a role in floral meristem identity, but AP1 is required for the formation of sepals and petals whereas FUL is involved in cauline leaf and fruit development. AP1 and FUL are expressed in mutually exclusive domains but also differ in sequence, with unique conserved motifs in the C-terminal domains of the proteins that suggest functional differentiation. To determine whether the functional divergence of AP1 and FUL is due to changes in regulation or changes in coding sequence, we performed promoter swap experiments, in which FUL was expressed in the AP1 domain in the ap1 mutant and vice versa. Our results show that FUL can partially substitute for AP1, and AP1 can partially substitute for FUL; thus, the functional divergence between AP1 and FUL is due to changes in both regulation and coding sequence. We also mutated AP1 and FUL conserved motifs to determine if they are required for protein function and tested the ability of these mutated proteins to interact in yeast with known partners. We found that these motifs appear to play at best a minor role in protein function and dimerization capability, despite being strongly conserved. Our results suggest that the functional differentiation of these two paralogous key transcriptional regulators involves both differences in regulation and in sequence; however, sequence changes in the form of unique conserved motifs do not explain the differences observed. PMID:26697035

  18. Functional Divergence of APETALA1 and FRUITFULL is due to Changes in both Regulation and Coding Sequence.

    PubMed

    McCarthy, Elizabeth W; Mohamed, Abeer; Litt, Amy

    2015-01-01

    Gene duplications are prevalent in plants, and functional divergence subsequent to duplication may be linked with the occurrence of novel phenotypes in plant evolution. Here, we examine the functional divergence of Arabidopsis thaliana APETALA1 (AP1) and FRUITFULL (FUL), which arose via a duplication correlated with the origin of the core eudicots. Both AP1 and FUL play a role in floral meristem identity, but AP1 is required for the formation of sepals and petals whereas FUL is involved in cauline leaf and fruit development. AP1 and FUL are expressed in mutually exclusive domains but also differ in sequence, with unique conserved motifs in the C-terminal domains of the proteins that suggest functional differentiation. To determine whether the functional divergence of AP1 and FUL is due to changes in regulation or changes in coding sequence, we performed promoter swap experiments, in which FUL was expressed in the AP1 domain in the ap1 mutant and vice versa. Our results show that FUL can partially substitute for AP1, and AP1 can partially substitute for FUL; thus, the functional divergence between AP1 and FUL is due to changes in both regulation and coding sequence. We also mutated AP1 and FUL conserved motifs to determine if they are required for protein function and tested the ability of these mutated proteins to interact in yeast with known partners. We found that these motifs appear to play at best a minor role in protein function and dimerization capability, despite being strongly conserved. Our results suggest that the functional differentiation of these two paralogous key transcriptional regulators involves both differences in regulation and in sequence; however, sequence changes in the form of unique conserved motifs do not explain the differences observed.

  19. Beta.-glucosidase coding sequences and protein from orpinomyces PC-2

    DOEpatents

    Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong; Ximenes, Eduardo A.

    2001-02-06

    Provided is a novel .beta.-glucosidase from Orpinomyces sp. PC2, nucleotide sequences encoding the mature protein and the precursor protein, and methods for recombinant production of this .beta.-glucosidase.

  20. The amino acid sequence of monal pheasant lysozyme and its activity.

    PubMed

    Araki, T; Matsumoto, T; Torikata, T

    1998-10-01

    The amino acid sequence of monal pheasant lysozyme and its activity were analyzed. Carboxymethylated lysozyme was digested with trypsin and the resulting peptides were sequenced. The established amino acid sequence had one amino acid substitution at position 102 (Arg to Gly) comparing with Indian peafowl lysozyme and four amino acid substitutions at positions 3 (Phe to Tyr), 15 (His to Leu), 41 (Gln to His), and 121 (Gln to His) with chicken lysozyme. Analysis of the time-courses of reaction using N-acetylglucosamine pentamer as a substrate showed a difference of binding free energy change (-0.4 kcal/mol) at subsites A between monal pheasant and Indian peafowl lysozyme. This was assumed to be caused by the amino acid substitution at subsite A with loss of a positive charge at position 102 (Arg102 to Gly).

  1. The amino acid sequence of monal pheasant lysozyme and its activity.

    PubMed

    Araki, T; Matsumoto, T; Torikata, T

    1998-10-01

    The amino acid sequence of monal pheasant lysozyme and its activity were analyzed. Carboxymethylated lysozyme was digested with trypsin and the resulting peptides were sequenced. The established amino acid sequence had one amino acid substitution at position 102 (Arg to Gly) comparing with Indian peafowl lysozyme and four amino acid substitutions at positions 3 (Phe to Tyr), 15 (His to Leu), 41 (Gln to His), and 121 (Gln to His) with chicken lysozyme. Analysis of the time-courses of reaction using N-acetylglucosamine pentamer as a substrate showed a difference of binding free energy change (-0.4 kcal/mol) at subsites A between monal pheasant and Indian peafowl lysozyme. This was assumed to be caused by the amino acid substitution at subsite A with loss of a positive charge at position 102 (Arg102 to Gly). PMID:9836434

  2. cDNA-derived amino acid sequences of myoglobins from nine species of whales and dolphins.

    PubMed

    Iwanami, Kentaro; Mita, Hajime; Yamamoto, Yasuhiko; Fujise, Yoshihiro; Yamada, Tadasu; Suzuki, Tomohiko

    2006-10-01

    We determined the myoglobin (Mb) cDNA sequences of nine cetaceans, of which six are the first reports of Mb sequences: sei whale (Balaenoptera borealis), Bryde's whale (Balaenoptera edeni), pygmy sperm whale (Kogia breviceps), Stejneger's beaked whale (Mesoplodon stejnegeri), Longman's beaked whale (Indopacetus pacificus), and melon-headed whale (Peponocephala electra), and three confirm the previously determined chemical amino acid sequences: sperm whale (Physeter macrocephalus), common minke whale (Balaenoptera acutorostrata) and pantropical spotted dolphin (Stenella attenuata). We found two types of Mb in the skeletal muscle of pantropical spotted dolphin: Mb I with the same amino acid sequence as that deposited in the protein database, and Mb II, which differs at two amino acid residues compared with Mb I. Using an alignment of the amino acid or cDNA sequences of cetacean Mb, we constructed a phylogenetic tree by the NJ method. Clustering of cetacean Mb amino acid and cDNA sequences essentially follows the classical taxonomy of cetaceans, suggesting that Mb sequence data is valid for classification of cetaceans at least to the family level. PMID:16962803

  3. Studies on monotreme proteins. VII. Amino acid sequence of myoglobin from the platypus, Ornithoryhynchus anatinus.

    PubMed

    Fisher, W K; Thompson, E O

    1976-03-01

    Myoglobin isolated from skeletal muscle of the platypus contains 153 amino acid residues. The complete amino acid sequence has been determined following cleavage with cyanogen bromide and further digestion of the four fragments with trypsin, chymotrypsin, pepsin and thermolysin. Sequences of the purified peptides were determined by the dansyl-Edman procedure. The amino acid sequence showed 25 differences from human myoglobin and 24 from kangaroo myoglobin. Amino acid sequences in myoglobins are more conserved than sequences in the alpha- and beta-globin chains, and platypus myoglobin shows a similar number of variations in sequence to kangaroo myoglobin when compared with myoglobin of other species. The date of divergence of the platypus from other mammals was estimated at 102 +/- 31 million years, based on the number of amino acid differences between species and allowing for mutations during the evolutionary period. This estimate differs widely from the estimate given by similar treatment of the alpha- and beta-chain sequences and a constant rate of mutation of globin chains is not supported. PMID:962722

  4. Balbiani ring DNA: sequence comparisons and evolutionary history of a family of hierarchically repetitive protein-coding genes.

    PubMed

    Pustell, J; Kafatos, F C; Wobus, U; Bäumlein, H

    1984-01-01

    All known types of Balbiani ring (BR) genes consist of multiple, tandemly arranged, ca. 180 to 300-bp repeat units that can be divided into a constant region and a subrepeat region. The latter region includes short tandem subrepeats (SRs). Comparison of all available BR sequences using computer methods has enabled us (a) to define more precisely the constant and subrepeat regions, (b) to infer the evolutionary relationships among the various types of BR repeats, (c) to derive a consensus approximation of an ancestral sequence from a small segment of which the highly diverse present-day SRs may have originated, and (d) to detect an underlying substructure in the constant region, evident in the consensus but not in the present-day sequences and possibly corresponding to an original 39-bp DNA segment from which the extant, giant BR sequences may have evolved. We discuss the processes of reduplication, diversification, and homogenization within the hierarchically repetitive BR sequences as examples of how a simple DNA element may evolve into a diverse family of large, protein-coding genes.

  5. The complete coding region sequence of river buffalo (Bubalus bubalis) SRY gene.

    PubMed

    Parma, Pietro; Feligini, Maria; Greppi, Gianfranco; Enne, Giuseppe

    2004-02-01

    The Y-linked SRY gene is responsible for testis determination in mammals. Mutations in this gene can lead to XY Gonadal Dysgenesis, an abnormal sexual phenotype described in humans, cattle, horses and river buffalo. We report here the complete river buffalo SRY sequence in order to enable the genetic diagnosis of this disease. The SRY sequence was also used to confirm the evolutionary divergence time between cattle and river buffalo 10 million years ago.

  6. HLA-F coding and regulatory segments variability determined by massively parallel sequencing procedures in a Brazilian population sample.

    PubMed

    Lima, Thálitta Hetamaro Ayala; Buttura, Renato Vidal; Donadi, Eduardo Antônio; Veiga-Castelli, Luciana Caricati; Mendes-Junior, Celso Teixeira; Castelli, Erick C

    2016-10-01

    Human Leucocyte Antigen F (HLA-F) is a non-classical HLA class I gene distinguished from its classical counterparts by low allelic polymorphism and distinctive expression patterns. Its exact function remains unknown. It is believed that HLA-F has tolerogenic and immune modulatory properties. Currently, there is little information regarding the HLA-F allelic variation among human populations and the available studies have evaluated only a fraction of the HLA-F gene segment and/or have searched for known alleles only. Here we present a strategy to evaluate the complete HLA-F variability including its 5' upstream, coding and 3' downstream segments by using massively parallel sequencing procedures. HLA-F variability was surveyed on 196 individuals from the Brazilian Southeast. The results indicate that the HLA-F gene is indeed conserved at the protein level, where thirty coding haplotypes or coding alleles were detected, encoding only four different HLA-F full-length protein molecules. Moreover, a same protein molecule is encoded by 82.45% of all coding alleles detected in this Brazilian population sample. However, the HLA-F nucleotide and haplotype variability is much higher than our current knowledge both in Brazilians and considering the 1000 Genomes Project data. This protein conservation is probably a consequence of the key role of HLA-F in the immune system physiology.

  7. Multiple Genome Sequences of Important Beer-Spoiling Lactic Acid Bacteria

    PubMed Central

    Geissler, Andreas J.; Vogel, Rudi F.

    2016-01-01

    Seven strains of important beer-spoiling lactic acid bacteria were sequenced using single-molecule real-time sequencing. Complete genomes were obtained for strains of Lactobacillus paracollinoides, Lactobacillus lindneri, and Pediococcus claussenii. The analysis of these genomes emphasizes the role of plasmids as the genomic foundation of beer-spoiling ability. PMID:27795248

  8. Inferences from protein and nucleic acid sequences - Early molecular evolution, divergence of kingdoms and rates of change

    NASA Technical Reports Server (NTRS)

    Dayhoff, M. O.; Barker, W. C.; Mclaughlin, P. J.

    1974-01-01

    Description of new sensitive, objective methods for establishing the probable common ancestry of very distantly related sequences and the quantitative evolutionary change which has taken place. These methods are applied to four families of proteins and nucleic acids and evolutionary trees will be derived where possible. Of the three families containing duplications of genetic material, two are nucleic acids: transfer RNA and 5S ribosomal RNA. Both of these structures are functional in the synthesis of coded proteins, and prototypes must have been present in the cell at the inception of the fundamental coding process that all living things share. There are many types of tRNA which recognize the various nucleotide triplets and the 20 amino acids. These types are thought to have arisen as a result of many gene duplications. Relationships among these types are discussed. The 5S ribosomal RNA, presently functional in both eukaryotes and prokaryotes, is very likely descended from an early form incorporating almost a complete duplication of genetic material. The amount of evolution in the various lines can again be compared. The other two families containing duplications are proteins; ferredoxin and cytochrome c.

  9. The bioinformatics of nucleotide sequence coding for proteins requiring metal coenzymes and proteins embedded with metals

    NASA Astrophysics Data System (ADS)

    Tremberger, G.; Dehipawala, Sunil; Cheung, E.; Holden, T.; Sullivan, R.; Nguyen, A.; Lieberman, D.; Cheung, T.

    2015-09-01

    All metallo-proteins need post-translation metal incorporation. In fact, the isotope ratio of Fe, Cu, and Zn in physiology and oncology have emerged as an important tool. The nickel containing F430 is the prosthetic group of the enzyme methyl coenzyme M reductase which catalyzes the release of methane in the final step of methano-genesis, a prime energy metabolism candidate for life exploration space mission in the solar system. The 3.5 Gyr early life sulfite reductase as a life switch energy metabolism had Fe-Mo clusters. The nitrogenase for nitrogen fixation 3 billion years ago had Mo. The early life arsenite oxidase needed for anoxygenic photosynthesis energy metabolism 2.8 billion years ago had Mo and Fe. The selection pressure in metal incorporation inside a protein would be quantifiable in terms of the related nucleotide sequence complexity with fractal dimension and entropy values. Simulation model showed that the studied metal-required energy metabolism sequences had at least ten times more selection pressure relatively in comparison to the horizontal transferred sequences in Mealybug, guided by the outcome histogram of the correlation R-sq values. The metal energy metabolism sequence group was compared to the circadian clock KaiC sequence group using magnesium atomic level bond shifting mechanism in the protein, and the simulation model would suggest a much higher selection pressure for the energy life switch sequence group. The possibility of using Kepler 444 as an example of ancient life in Galaxy with the associated exoplanets has been proposed and is further discussed in this report. Examples of arsenic metal bonding shift probed by Synchrotron-based X-ray spectroscopy data and Zn controlled FOXP2 regulated pathways in human and chimp brain studied tissue samples are studied in relationship to the sequence bioinformatics. The analysis results suggest that relatively large metal bonding shift amount is associated with low probability correlation R

  10. Short Time-Scale Sensory Coding in S1 during Discrimination of Whisker Vibrotactile Sequences

    PubMed Central

    Miyashita, Toshio; Lee, Daniel J.; Smith, Katherine A.; Feldman, Daniel E.

    2016-01-01

    Rodent whisker input consists of dense microvibration sequences that are often temporally integrated for perceptual discrimination. Whether primary somatosensory cortex (S1) participates in temporal integration is unknown. We trained rats to discriminate whisker impulse sequences that varied in single-impulse kinematics (5–20-ms time scale) and mean speed (150-ms time scale). Rats appeared to use the integrated feature, mean speed, to guide discrimination in this task, consistent with similar prior studies. Despite this, 52% of S1 units, including 73% of units in L4 and L2/3, encoded sequences at fast time scales (≤20 ms, mostly 5–10 ms), accurately reflecting single impulse kinematics. 17% of units, mostly in L5, showed weaker impulse responses and a slow firing rate increase during sequences. However, these units did not effectively integrate whisker impulses, but instead combined weak impulse responses with a distinct, slow signal correlated to behavioral choice. A neural decoder could identify sequences from fast unit spike trains and behavioral choice from slow units. Thus, S1 encoded fast time scale whisker input without substantial temporal integration across whisker impulses. PMID:27574970

  11. Phylum-Level Conservation of Regulatory Information in Nematodes despite Extensive Non-coding Sequence Divergence

    PubMed Central

    Gordon, Kacy L.; Arthur, Robert K.; Ruvinsky, Ilya

    2015-01-01

    Gene regulatory information guides development and shapes the course of evolution. To test conservation of gene regulation within the phylum Nematoda, we compared the functions of putative cis-regulatory sequences of four sets of orthologs (unc-47, unc-25, mec-3 and elt-2) from distantly-related nematode species. These species, Caenorhabditis elegans, its congeneric C. briggsae, and three parasitic species Meloidogyne hapla, Brugia malayi, and Trichinella spiralis, represent four of the five major clades in the phylum Nematoda. Despite the great phylogenetic distances sampled and the extensive sequence divergence of nematode genomes, all but one of the regulatory elements we tested are able to drive at least a subset of the expected gene expression patterns. We show that functionally conserved cis-regulatory elements have no more extended sequence similarity to their C. elegans orthologs than would be expected by chance, but they do harbor motifs that are important for proper expression of the C. elegans genes. These motifs are too short to be distinguished from the background level of sequence similarity, and while identical in sequence they are not conserved in orientation or position. Functional tests reveal that some of these motifs contribute to proper expression. Our results suggest that conserved regulatory circuitry can persist despite considerable turnover within cis elements. PMID:26020930

  12. Short Time-Scale Sensory Coding in S1 during Discrimination of Whisker Vibrotactile Sequences.

    PubMed

    McGuire, Leah M; Telian, Gregory; Laboy-Juárez, Keven J; Miyashita, Toshio; Lee, Daniel J; Smith, Katherine A; Feldman, Daniel E

    2016-08-01

    Rodent whisker input consists of dense microvibration sequences that are often temporally integrated for perceptual discrimination. Whether primary somatosensory cortex (S1) participates in temporal integration is unknown. We trained rats to discriminate whisker impulse sequences that varied in single-impulse kinematics (5-20-ms time scale) and mean speed (150-ms time scale). Rats appeared to use the integrated feature, mean speed, to guide discrimination in this task, consistent with similar prior studies. Despite this, 52% of S1 units, including 73% of units in L4 and L2/3, encoded sequences at fast time scales (≤20 ms, mostly 5-10 ms), accurately reflecting single impulse kinematics. 17% of units, mostly in L5, showed weaker impulse responses and a slow firing rate increase during sequences. However, these units did not effectively integrate whisker impulses, but instead combined weak impulse responses with a distinct, slow signal correlated to behavioral choice. A neural decoder could identify sequences from fast unit spike trains and behavioral choice from slow units. Thus, S1 encoded fast time scale whisker input without substantial temporal integration across whisker impulses. PMID:27574970

  13. Phylum-Level Conservation of Regulatory Information in Nematodes despite Extensive Non-coding Sequence Divergence.

    PubMed

    Gordon, Kacy L; Arthur, Robert K; Ruvinsky, Ilya

    2015-05-01

    Gene regulatory information guides development and shapes the course of evolution. To test conservation of gene regulation within the phylum Nematoda, we compared the functions of putative cis-regulatory sequences of four sets of orthologs (unc-47, unc-25, mec-3 and elt-2) from distantly-related nematode species. These species, Caenorhabditis elegans, its congeneric C. briggsae, and three parasitic species Meloidogyne hapla, Brugia malayi, and Trichinella spiralis, represent four of the five major clades in the phylum Nematoda. Despite the great phylogenetic distances sampled and the extensive sequence divergence of nematode genomes, all but one of the regulatory elements we tested are able to drive at least a subset of the expected gene expression patterns. We show that functionally conserved cis-regulatory elements have no more extended sequence similarity to their C. elegans orthologs than would be expected by chance, but they do harbor motifs that are important for proper expression of the C. elegans genes. These motifs are too short to be distinguished from the background level of sequence similarity, and while identical in sequence they are not conserved in orientation or position. Functional tests reveal that some of these motifs contribute to proper expression. Our results suggest that conserved regulatory circuitry can persist despite considerable turnover within cis elements.

  14. Sequence analysis of UTR and coding region of kappa-casein gene of Indian riverine buffalo (Bubalus bubalis).

    PubMed

    Mukesh, Manishi; Mishra, Bishnu P; Kataria, Ranjit S; Sobti, Ranbir C; Ahlawat, Shiv Pal S

    2006-04-01

    In this study, complete nucleotide as well as derived amino acid sequence characterization of water buffalo (Bubalus bubalis) kappa-casein gene has been presented. Kappa-casein cDNA clones were identified and isolated from a buffalo lactating mammary gland cDNA library. Sequence analysis of kappa-casein cDNA revealed 850 nucleotides with an open reading frame (ORF) of 573 nucleotides, encoding mature peptide of 169 amino acids. The 5' untranslated region (UTR) comprised 71 nucleotides, while 3' UTR was of 206 nucleotides. A total of 11 nucleotide and seven amino acid changes were observed in, buffalo (Bubalus bubalis) as compared to cattle (Bos taurus), sheep (Ovis aries) and goat (Capra hircus). Among these nucleotide changes, eight were unique in buffalo as they were fully conserved in cattle, sheep and goat. Majority of the nucleotide changes and all the amino acid changes; 14 (Asp-Glu), 19(Asp/Ser-Asn), 96(Ala-Thr), 126(Ala-Val), 128(Ala/Gly-Val), 156(Ala/Pro-Val) and 168(Ala/Glu-Val) were limited to exon IV. Three glycosylation sites, Thr 131, Thr 133 and Thr 142 reported in cattle and goat kappa-casein gene were also conserved in buffalo, however, in sheep Thr 142 was replaced by Ala. Chymosin hydrolysis site, between amino acids Phe 105 and Met 106, important for rennet coagulation process, were found to be conserved across four bovid species. Buffalo kappa-casein with the presence of amino acids Thr 136 and Ala 148 seems to be an intermediate of "A" and "B" variants of cattle. Comparison with other livestock species revealed buffalo kappa-casein sharing maximum nucleotide (95.5%) and amino acid (92.6%) similarity with cattle, whereas with pig it showed least sequence similarity of 76.0% and 53.2%, respectively. Phylogenetic analysis based on both nucleotide and amino acid sequence indicated buffalo kappa-casein grouping with cattle, while sheep and goat forming a separate cluster close to them. The non-ruminant species viz. camel, horse and pig were

  15. Draft Genome Sequences of Two Novel Acidimicrobiaceae Members from an Acid Mine Drainage Biofilm Metagenome

    PubMed Central

    Pinto, Ameet J.; Sharp, Jonathan O.; Yoder, Michael J.

    2016-01-01

    Bacteria belonging to the family Acidimicrobiaceae are frequently encountered in heavy metal-contaminated acidic environments. However, their phylogenetic and metabolic diversity is poorly resolved. We present draft genome sequences of two novel and phylogenetically distinct Acidimicrobiaceae members assembled from an acid mine drainage biofilm metagenome. PMID:26769942

  16. Complete Genome Sequence of Streptomyces clavuligerus F613-1, an Industrial Producer of Clavulanic Acid.

    PubMed

    Cao, Guangxiang; Zhong, Chuanqing; Zong, Gongli; Fu, Jiafang; Liu, Zhong; Zhang, Guimin; Qin, Ronghuo

    2016-01-01

    Streptomyces clavuligerus strain F613-1 is an industrial strain with high-yield clavulanic acid production. In this study, the complete genome sequence of S. clavuligerus strain F613-1 was determined, including one linear chromosome and one linear plasmid, carrying numerous sets of genes involving in the biosynthesis of clavulanic acid.

  17. Complete Genome Sequence of Streptomyces clavuligerus F613-1, an Industrial Producer of Clavulanic Acid.

    PubMed

    Cao, Guangxiang; Zhong, Chuanqing; Zong, Gongli; Fu, Jiafang; Liu, Zhong; Zhang, Guimin; Qin, Ronghuo

    2016-01-01

    Streptomyces clavuligerus strain F613-1 is an industrial strain with high-yield clavulanic acid production. In this study, the complete genome sequence of S. clavuligerus strain F613-1 was determined, including one linear chromosome and one linear plasmid, carrying numerous sets of genes involving in the biosynthesis of clavulanic acid. PMID:27660792

  18. Complete Genome Sequence of Streptomyces clavuligerus F613-1, an Industrial Producer of Clavulanic Acid

    PubMed Central

    Zhong, Chuanqing; Zong, Gongli; Fu, Jiafang; Liu, Zhong; Zhang, Guimin; Qin, Ronghuo

    2016-01-01

    Streptomyces clavuligerus strain F613-1 is an industrial strain with high-yield clavulanic acid production. In this study, the complete genome sequence of S. clavuligerus strain F613-1 was determined, including one linear chromosome and one linear plasmid, carrying numerous sets of genes involving in the biosynthesis of clavulanic acid. PMID:27660792

  19. Parvalbumins from coelacanth muscle. III. Amino acid sequence of the major component.

    PubMed

    Jauregui-Adell, J; Pechere, J F

    1978-09-26

    The primary structure of the major parvalbumin (pI = 4.52) from coelacanth muscle (Latimeria chalumnae) has been determined. Sequence analysis of the tryptic peptides, in some cases obtained with beta-trypsin, accounts for the total amino acid content of the protein. Chymotryptic peptides provide appropriate sequence overlaps, to complete the localization of the tryptic peptides. Examination of the amino acid sequence of this protein shows the typical structure of a beta-parvalbumin. Its position in the dendrogram of related calcium-binding proteins corresponds to that usually accepted for crossopterygians.

  20. Cracking the Code of Human Diseases Using Next-Generation Sequencing: Applications, Challenges, and Perspectives.

    PubMed

    Precone, Vincenza; Del Monaco, Valentina; Esposito, Maria Valeria; De Palma, Fatima Domenica Elisa; Ruocco, Anna; Salvatore, Francesco; D'Argenio, Valeria

    2015-01-01

    Next-generation sequencing (NGS) technologies have greatly impacted on every field of molecular research mainly because they reduce costs and increase throughput of DNA sequencing. These features, together with the technology's flexibility, have opened the way to a variety of applications including the study of the molecular basis of human diseases. Several analytical approaches have been developed to selectively enrich regions of interest from the whole genome in order to identify germinal and/or somatic sequence variants and to study DNA methylation. These approaches are now widely used in research, and they are already being used in routine molecular diagnostics. However, some issues are still controversial, namely, standardization of methods, data analysis and storage, and ethical aspects. Besides providing an overview of the NGS-based approaches most frequently used to study the molecular basis of human diseases at DNA level, we discuss the principal challenges and applications of NGS in the field of human genomics. PMID:26665001

  1. Episodic sequence memory is supported by a theta-gamma phase code.

    PubMed

    Heusser, Andrew C; Poeppel, David; Ezzyat, Youssef; Davachi, Lila

    2016-10-01

    The meaning we derive from our experiences is not a simple static extraction of the elements but is largely based on the order in which those elements occur. Models propose that sequence encoding is supported by interactions between high- and low-frequency oscillations, such that elements within an experience are represented by neural cell assemblies firing at higher frequencies (gamma) and sequential order is encoded by the specific timing of firing with respect to a lower frequency oscillation (theta). During episodic sequence memory formation in humans, we provide evidence that items in different sequence positions exhibit greater gamma power along distinct phases of a theta oscillation. Furthermore, this segregation is related to successful temporal order memory. Our results provide compelling evidence that memory for order, a core component of an episodic memory, capitalizes on the ubiquitous physiological mechanism of theta-gamma phase-amplitude coupling. PMID:27571010

  2. Episodic sequence memory is supported by a theta-gamma phase code.

    PubMed

    Heusser, Andrew C; Poeppel, David; Ezzyat, Youssef; Davachi, Lila

    2016-10-01

    The meaning we derive from our experiences is not a simple static extraction of the elements but is largely based on the order in which those elements occur. Models propose that sequence encoding is supported by interactions between high- and low-frequency oscillations, such that elements within an experience are represented by neural cell assemblies firing at higher frequencies (gamma) and sequential order is encoded by the specific timing of firing with respect to a lower frequency oscillation (theta). During episodic sequence memory formation in humans, we provide evidence that items in different sequence positions exhibit greater gamma power along distinct phases of a theta oscillation. Furthermore, this segregation is related to successful temporal order memory. Our results provide compelling evidence that memory for order, a core component of an episodic memory, capitalizes on the ubiquitous physiological mechanism of theta-gamma phase-amplitude coupling.

  3. Cracking the Code of Human Diseases Using Next-Generation Sequencing: Applications, Challenges, and Perspectives

    PubMed Central

    Precone, Vincenza; Del Monaco, Valentina; Esposito, Maria Valeria; De Palma, Fatima Domenica Elisa; Ruocco, Anna; D'Argenio, Valeria

    2015-01-01

    Next-generation sequencing (NGS) technologies have greatly impacted on every field of molecular research mainly because they reduce costs and increase throughput of DNA sequencing. These features, together with the technology's flexibility, have opened the way to a variety of applications including the study of the molecular basis of human diseases. Several analytical approaches have been developed to selectively enrich regions of interest from the whole genome in order to identify germinal and/or somatic sequence variants and to study DNA methylation. These approaches are now widely used in research, and they are already being used in routine molecular diagnostics. However, some issues are still controversial, namely, standardization of methods, data analysis and storage, and ethical aspects. Besides providing an overview of the NGS-based approaches most frequently used to study the molecular basis of human diseases at DNA level, we discuss the principal challenges and applications of NGS in the field of human genomics. PMID:26665001

  4. Sequencing and computational analysis of complete genome sequences of Citrus yellow mosaic badna virus from acid lime and pummelo.

    PubMed

    Borah, Basanta K; Johnson, A M Anthony; Sai Gopal, D V R; Dasgupta, Indranil

    2009-08-01

    Citrus yellow mosaic badna virus (CMBV), a member of the Family Caulimoviridae, Genus Badnavirus, is the causative agent of Citrus mosaic disease in India. Although the virus has been detected in several citrus species, only two full-length genomes, one each from Sweet orange and Rangpur lime, are available in publicly accessible databases. In order to obtain a better understanding of the genetic variability of the virus in other citrus mosaic-affected citrus species, we performed the cloning and sequence analysis of complete genomes of CMBV from two additional citrus species, Acid lime and Pummelo. We show that CMBV genomes from the two hosts share high homology with previously reported CMBV sequences and hence conclude that the new isolates represent variants of the virus present in these species. Based on in silico sequence analysis, we predict the possible function of the protein encoded by one of the five ORFs.

  5. Amino acid sequence of anionic peroxidase from the windmill palm tree Trachycarpus fortunei.

    PubMed

    Baker, Margaret R; Zhao, Hongwei; Sakharov, Ivan Yu; Li, Qing X

    2014-12-10

    Palm peroxidases are extremely stable and have uncommon substrate specificity. This study was designed to fill in the knowledge gap about the structures of a peroxidase from the windmill palm tree Trachycarpus fortunei. The complete amino acid sequence and partial glycosylation were determined by MALDI-top-down sequencing of native windmill palm tree peroxidase (WPTP), MALDI-TOF/TOF MS/MS of WPTP tryptic peptides, and cDNA sequencing. The propeptide of WPTP contained N- and C-terminal signal sequences which contained 21 and 17 amino acid residues, respectively. Mature WPTP was 306 amino acids in length, and its carbohydrate content ranged from 21% to 29%. Comparison to closely related royal palm tree peroxidase revealed structural features that may explain differences in their substrate specificity. The results can be used to guide engineering of WPTP and its novel applications.

  6. Amino acid sequence of a new mitochondrially synthesized proteolipid of the ATP synthase of Saccharomyces cerevisiae.

    PubMed Central

    Velours, J; Esparza, M; Hoppe, J; Sebald, W; Guerin, B

    1984-01-01

    The purification and the amino acid sequence of a proteolipid translated on ribosomes in yeast mitochondria is reported. This protein, which is a subunit of the ATP synthase, was purified by extraction with chloroform/methanol (2/1) and subsequent chromatography on phosphocellulose and reverse phase h.p.l.c. A mol. wt. of 5500 was estimated by chromatography on Bio-Gel P-30 in 80% formic acid. The complete amino acid sequence of this protein was determined by automated solid phase Edman degradation of the whole protein and of fragments obtained after cleavage with cyanogen bromide. The sequence analysis indicates a length of 48 amino acid residues. The calculated mol. wt. of 5870 corresponds to the value found by gel chromatography. This polypeptide contains three basic residues and no negatively charged side chain. The three basic residues are clustered at the C terminus. The primary structure of this protein is in full agreement with the predicted amino acid sequence of the putative polypeptide encoded by the mitochondrial aap1 gene recently discovered in Saccharomyces cerevisiae. Moreover, this protein shows 50% homology with the amino acid sequence of a putative polypeptide encoded by an unidentified reading frame also discovered near the mitochondrial ATPase subunit 6 gene in Aspergillus nidulans. Images Fig. 2. PMID:6323165

  7. The thermostability of two kinds of recombinant ∆6-fatty acid desaturase with different N-terminal sequence lengths in low temperature.

    PubMed

    Lu, He; Zhu, Yu

    2013-09-01

    Two recombinant Rhizopus stolonifer ∆6-fatty acid desaturase enzymes with different-length N-termini were cloned and expressed in Saccharomyces cerevisiae strain INVScl: LRsD6D begins with the sequence of the N-terminal of the R. stolonifer ∆6-fatty acid desaturase native, encoding a deduced polypeptide of 459 amino acids (M-S-T-L-D-R-Q-S-I-F-T-I-K-E-L-E-S-I-S-Q-R-I-H-D-G-D-E-E-A-M-K-F), whereas SRsD6D begins with the amino acid sequence of the predicted ORF, encoding a deduced polypeptide of 430 amino acids (M-K-F) and LRsD6D is longer than SRsD6D by 29 amino acids (M-S-T-L-D-R-Q-S-I-F-T-I-K-E-L-E-S-I-S-Q-R-I-H-D-G-D-E-E-A). Bioinformatic analysis characterized the two recombinant ∆6-fatty acid desaturase enzymes with different-length N-termini, including three conserved histidine-rich motifs, hydropathy profile, and a cytochrome b5-like domain in the N-terminus. When the coding sequence was expressed in S. cerevisiae strain INVScl, the coding produced ∆6-fatty acid desaturase activity exhibited by RsD6D, leading to a novel peak corresponding to γ-linolenic acid methyl ester standards, which was detected with the same retention time. The residual activity of LRsD6D was 74 % at 15 °C for 4 h and that of SRsD6D was 43 %. Purified recombinant LRsD6D was more stable than SRsD6D, indicating that the N-terminal extension, containing mostly hydrophobic residues, affected the overall stability of recombinant LRsD6D.

  8. cDNA sequence coding for the alpha'-chain of the third complement component in the African lungfish.

    PubMed

    Sato, A; Sültmann, H; Mayer, W E; Figueroa, F; Tichy, H; Klein, J

    1999-04-01

    cDNA clones coding for almost the entire C3 alpha-chain of the African lungfish (Protopterus aethiopicus), a representative of the Sarcopterygii (lobe-finned fishes), were sequenced and characterized. From the sequence it is deduced that the lungfish C3 molecule is probably a disulphide-bonded alpha:beta dimer similar to that of the C3 components of other jawed vertebrates. The deduced sequence contains conserved sites presumably recognized by proteolytic enzymes (e.g. factor I) involved in the activation and inactivation of the component. It also contains the conserved thioester region and the putative site for binding properdin. However, the site for the interaction with complement receptor 2 and factor H are poorly conserved. Either complement receptor 2 and factor H are not present in the lungfish or they bind to different residues at the same or a different site than mammalian complement receptor 2 and factor H. The C3 alpha-chain sequences faithfully reflect the phylogenetic relationships among vertebrate classes and can therefore be used to help to resolve the long-standing controversy concerning the origin of the tetrapods. PMID:10219761

  9. Molecular phylogenetic analysis in Hammondia-like organisms based on partial Hsp70 coding sequences

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The 70-kDa heat shock protein (Hsp70) sequences are considered one of the most conserved proteins in all domain of life from Archaea to eukaryotes. Hammondia heydorni, H. hammondi, Toxoplasma gondii, Neospora hughesi and N. caninum (Hammondia-like organisms) are closely related tissue cyst-forming c...

  10. Analysis of mutations in the entire coding sequence of the factor VIII gene

    SciTech Connect

    Bidichadani, S.I.; Lanyon, W.G.; Connor, J.M.

    1994-09-01

    Hemophilia A is a common X-linked recessive disorder of bleeding caused by deleterious mutations in the gene for clotting factor VIII. The large size of the factor VIII gene, the high frequency of de novo mutations and its tissue-specific expression complicate the detection of mutations. We have used a combination of RT-PCR of ectopic factor VIII transcripts and genomic DNA-PCRs to amplify the entire essential sequence of the factor VIII gene. This is followed by chemical mismatch cleavage analysis and direct sequencing in order to facilitate a comprehensive search for mutations. We describe the characterization of nine potentially pathogenic mutations, six of which are novel. In each case, a correlation of the genotype with the observed phenotype is presented. In order to evaluate the pathogenicity of the five missense mutations detected, we have analyzed them for evolutionary sequence conservation and for their involvement of sequence motifs catalogued in the PROSITE database of protein sites and patterns.

  11. Genes coding for metal induced synthesis of RNA sequences are differentially amplified and regulated in mammalian cells. [CHO cells

    SciTech Connect

    Walters, R.A.; Enger, M.D.; Hildebrand, C.E.; Griffith, J.K.

    1981-01-01

    Three variant cell lines were isolated which survive cadmium (Cd/sup + +/) concentrations 10 to 200 fold greater than that which kills parental Chinese hamster cells (line CHO). Cadmium treatment of the variants induces the synthesis of a highly abundant poly A/sup +/ RNA class which directs the synthesis of metallothionein in a cell-free translation system. Hybridization of cDNA complementary to these inducible, highly abundant RNA sequences (cDNA/sub a/) with RNA from variant cells showed that: (1) the induced abundant class has a total complexity of approx. 2000 NT; (2) CD/sup + +/ induction increases the cellular concentration of these sequences approx. 2000 fold above preinduction levels in each of the variants; and (3) most, if not all, of these sequences are expressed constitutively in uninduced cells. Cadmium induction of sensitive CHO cells increases the cellular concentration of only a subset of the sequences inducible in resistant cells and then only to a level 100 fold higher than in uninduced cells. Only approx. 50% of the sequences are constitutively expressed at measurable levels in uninduced CHO cells. Hybridization of cDNA/sub a/ with genomic DNA from the three resistant variants showed that genes coding for the induction of specific RNA sequences are amplified approx. 10 fold in Cd/sup r/20F4 cells, approx. 4 fold in Cd/sup r/30F9 cells, and unamplified in Cd/sup r/2C10 cells relative to CHO. While sensitive CHO cells can tolerate only 0.2 ..mu..M Cd/sup + +/, Cd/sup r/30F9, Cd/sup r/20F4, and Cd/sup r/2C10 cells are resistant to 40 ..mu..M, 26 ..mu..M, and 2 ..mu..M Cd/sup + +/ respectively. Thus, gene amplification alone cannot be responsible for the observed resistance of the variant cell lines.

  12. Complete amino acid sequence and structure characterization of the taste-modifying protein, miraculin.

    PubMed

    Theerasilp, S; Hitotsuya, H; Nakajo, S; Nakaya, K; Nakamura, Y; Kurihara, Y

    1989-04-25

    The taste-modifying protein, miraculin, has the unusual property of modifying sour taste into sweet taste. The complete amino acid sequence of miraculin purified from miracle fruits by a newly developed method (Theerasilp, S., and Kurihara, Y. (1988) J. Biol. Chem. 263, 11536-11539) was determined by an automatic Edman degradation method. Miraculin was a single polypeptide with 191 amino acid residues. The calculated molecular weight based on the amino acid sequence and the carbohydrate content (13.9%) was 24,600. Asn-42 and Asn-186 were linked N-glycosidically to carbohydrate chains. High homology was found between the amino acid sequences of miraculin and soybean trypsin inhibitor. PMID:2708331

  13. Homology of amino acid sequences of rat liver cathepsins B and H with that of papain.

    PubMed Central

    Takio, K; Towatari, T; Katunuma, N; Teller, D C; Titani, K

    1983-01-01

    The amino acid sequences of rat liver lysosomal thiol endopeptidases, cathepsins B and H, are presented and compared with that of the plant thiol protease papain. The 252-residue sequence of cathepsin B and the 220-residue sequence of cathepsin H were determined largely by automated Edman degradation of their intact polypeptide chains and of the two chains of each enzyme generated by limited proteolysis. Subfragments of the chains were produced by enzymatic digestion and by chemical cleavage of methionyl and tryptophanyl bonds. Comparison of the amino acid sequences of cathepsins B and H with each other and with that of papain demonstrates a striking homology among their primary structures. Sequence identity is extremely high in regions which, according to the three-dimensional structure of papain, constitute the catalytic site. The results not only reveal the first structural features of mammalian thiol endopeptidases but also provide insight into the evolutionary relationships among plant and mammalian thiol proteases. PMID:6574504

  14. Ribosome-mediated incorporation of a non-standard amino acid into a peptide through expansion of the genetic code.

    PubMed

    Bain, J D; Switzer, C; Chamberlin, A R; Benner, S A

    1992-04-01

    One serious limitation facing protein engineers is the availability of only 20 'proteinogenic' amino acids encoded by natural messenger RNA. The lack of structural diversity among these amino acids restricts the mechanistic and structural issues that can be addressed by site-directed mutagenesis. Here we describe a new technology for incorporating non-standard amino acids into polypeptides by ribosome-based translation. In this technology, the genetic code is expanded through the creation of a 65th codon-anticodon pair from unnatural nucleoside bases having non-standard hydrogen-bonding patterns. This new codon-anticodon pair efficiently supports translation in vitro to yield peptides containing a non-standard amino acid. The versatility of the ribosome as a synthetic tool offers new possibilities for protein engineering, and compares favourably with another recently described approach in which the genetic code is simply rearranged to recruit stop codons to play a coding role. PMID:1560827

  15. Integration of Expressed Sequence Tag Data Flanking Predicted RNA Secondary Structures Facilitates Novel Non-Coding RNA Discovery

    PubMed Central

    Krzyzanowski, Paul M.; Price, Feodor D.; Muro, Enrique M.; Rudnicki, Michael A.; Andrade-Navarro, Miguel A.

    2011-01-01

    Many computational methods have been used to predict novel non-coding RNAs (ncRNAs), but none, to our knowledge, have explicitly investigated the impact of integrating existing cDNA-based Expressed Sequence Tag (EST) data that flank structural RNA predictions. To determine whether flanking EST data can assist in microRNA (miRNA) prediction, we identified genomic sites encoding putative miRNAs by combining functional RNA predictions with flanking ESTs data in a model consistent with miRNAs undergoing cleavage during maturation. In both human and mouse genomes, we observed that the inclusion of flanking ESTs adjacent to and not overlapping predicted miRNAs significantly improved the performance of various methods of miRNA prediction, including direct high-throughput sequencing of small RNA libraries. We analyzed the expression of hundreds of miRNAs predicted to be expressed during myogenic differentiation using a customized microarray and identified several known and predicted myogenic miRNA hairpins. Our results indicate that integrating ESTs flanking structural RNA predictions improves the quality of cleaved miRNA predictions and suggest that this strategy can be used to predict other non-coding RNAs undergoing cleavage during maturation. PMID:21698286

  16. Identification of non-coding RNAs associated with telomeres using a combination of enChIP and RNA sequencing.

    PubMed

    Fujita, Toshitsugu; Yuno, Miyuki; Okuzaki, Daisuke; Ohki, Rieko; Fujii, Hodaka

    2015-01-01

    Accumulating evidence suggests that RNAs interacting with genomic regions play important roles in the regulation of genome functions, including X chromosome inactivation and gene expression. However, to our knowledge, no non-biased methods of identifying RNAs that interact with a specific genomic region have been reported. Here, we used enChIP-RNA-Seq, a combination of engineered DNA-binding molecule-mediated chromatin immunoprecipitation (enChIP) and RNA sequencing (RNA-Seq), to perform a non-biased search for RNAs interacting with telomeres. In enChIP-RNA-Seq, the target genomic regions are captured using an engineered DNA-binding molecule such as a transcription activator-like protein. Subsequently, RNAs that interact with the target genomic regions are purified and sequenced. The RNAs detected by enChIP-RNA-Seq contained known telomere-binding RNAs, including the telomerase RNA component (Terc), the RNA component of mitochondrial RNA processing endoribonuclease (Rmrp), and Cajal body-specific RNAs. In addition, a number of novel telomere-binding non-coding RNAs were also identified. Binding of two candidate non-coding RNAs to telomeres was confirmed by immunofluorescence microscopy and RNA fluorescence in situ hybridization (RNA-FISH) analyses. The novel telomere-binding non-coding RNAs identified here may play important roles in telomere functions. To our knowledge, this study is the first non-biased identification of RNAs associated with specific genomic regions. The results presented here suggest that enChIP-RNA-Seq analyses are useful for the identification of RNAs interacting with specific genomic regions, and may help to contribute to current understanding of the regulation of genome functions.

  17. Maps, codes, and sequence elements: can we predict the protein output from an alternatively spliced locus?

    PubMed

    Sharma, Shalini; Black, Douglas L

    2006-11-22

    Alternative splicing choices are governed by splicing regulatory protein interactions with splicing silencer and enhancer elements present in the pre-mRNA. However, the prediction of these choices from genomic sequence is difficult, in part because the regulators can act as either enhancers or silencers. A recent study describes how for a particular neuronal splicing regulatory protein, Nova, the location of its binding sites is highly predictive of the protein's effect on an exon's splicing.

  18. Non-Coding RNA: Sequence-Specific Guide for Chromatin Modification and DNA Damage Signaling

    PubMed Central

    Francia, Sofia

    2015-01-01

    Chromatin conformation shapes the environment in which our genome is transcribed into RNA. Transcription is a source of DNA damage, thus it often occurs concomitantly to DNA damage signaling. Growing amounts of evidence suggest that different types of RNAs can, independently from their protein-coding properties, directly affect chromatin conformation, transcription and splicing, as well as promote the activation of the DNA damage response (DDR) and DNA repair. Therefore, transcription paradoxically functions to both threaten and safeguard genome integrity. On the other hand, DNA damage signaling is known to modulate chromatin to suppress transcription of the surrounding genetic unit. It is thus intriguing to understand how transcription can modulate DDR signaling while, in turn, DDR signaling represses transcription of chromatin around the DNA lesion. An unexpected player in this field is the RNA interference (RNAi) machinery, which play roles in transcription, splicing and chromatin modulation in several organisms. Non-coding RNAs (ncRNAs) and several protein factors involved in the RNAi pathway are well known master regulators of chromatin while only recent reports show their involvement in DDR. Here, we discuss the experimental evidence supporting the idea that ncRNAs act at the genomic loci from which they are transcribed to modulate chromatin, DDR signaling and DNA repair. PMID:26617633

  19. A unified mathematical framework for coding time, space, and sequences in the hippocampal region.

    PubMed

    Howard, Marc W; MacDonald, Christopher J; Tiganj, Zoran; Shankar, Karthik H; Du, Qian; Hasselmo, Michael E; Eichenbaum, Howard

    2014-03-26

    The medial temporal lobe (MTL) is believed to support episodic memory, vivid recollection of a specific event situated in a particular place at a particular time. There is ample neurophysiological evidence that the MTL computes location in allocentric space and more recent evidence that the MTL also codes for time. Space and time represent a similar computational challenge; both are variables that cannot be simply calculated from the immediately available sensory information. We introduce a simple mathematical framework that computes functions of both spatial location and time as special cases of a more general computation. In this framework, experience unfolding in time is encoded via a set of leaky integrators. These leaky integrators encode the Laplace transform of their input. The information contained in the transform can be recovered using an approximation to the inverse Laplace transform. In the temporal domain, the resulting representation reconstructs the temporal history. By integrating movements, the equations give rise to a representation of the path taken to arrive at the present location. By modulating the transform with information about allocentric velocity, the equations code for position of a landmark. Simulated cells show a close correspondence to neurons observed in various regions for all three cases. In the temporal domain, novel secondary analyses of hippocampal time cells verified several qualitative predictions of the model. An integrated representation of spatiotemporal context can be computed by taking conjunctions of these elemental inputs, leading to a correspondence with conjunctive neural representations observed in dorsal CA1.

  20. Emergence and Evolution of Hominidae-Specific Coding and Noncoding Genomic Sequences.

    PubMed

    Saber, Morteza Mahmoudi; Adeyemi Babarinde, Isaac; Hettiarachchi, Nilmini; Saitou, Naruya

    2016-07-12

    Family Hominidae, which includes humans and great apes, is recognized for unique complex social behavior and intellectual abilities. Despite the increasing genome data, however, the genomic origin of its phenotypic uniqueness has remained elusive. Clade-specific genes and highly conserved noncoding sequences (HCNSs) are among the high-potential evolutionary candidates involved in driving clade-specific characters and phenotypes. On this premise, we analyzed whole genome sequences along with gene orthology data retrieved from major DNA databases to find Hominidae-specific (HS) genes and HCNSs. We discovered that Down syndrome critical region 4 (DSCR4) is the only experimentally verified gene uniquely present in Hominidae. DSCR4 has no structural homology to any known protein and was inferred to have emerged in several steps through LTR/ERV1, LTR/ERVL retrotransposition, and transversion. Using the genomic distance as neutral evolution threshold, we identified 1,658 HS HCNSs. Polymorphism coverage and derived allele frequency analysis of HS HCNSs showed that these HCNSs are under purifying selection, indicating that they may harbor important functions. They are overrepresented in promoters/untranslated regions, in close proximity of genes involved in sensory perception of sound and developmental process, and also showed a significantly lower nucleosome occupancy probability. Interestingly, many ancestral sequences of the HS HCNSs showed very high evolutionary rates. This suggests that new functions emerged through some kind of positive selection, and then purifying selection started to operate to keep these functions.

  1. Emergence and Evolution of Hominidae-Specific Coding and Noncoding Genomic Sequences

    PubMed Central

    Saber, Morteza Mahmoudi; Adeyemi Babarinde, Isaac; Hettiarachchi, Nilmini; Saitou, Naruya

    2016-01-01

    Family Hominidae, which includes humans and great apes, is recognized for unique complex social behavior and intellectual abilities. Despite the increasing genome data, however, the genomic origin of its phenotypic uniqueness has remained elusive. Clade-specific genes and highly conserved noncoding sequences (HCNSs) are among the high-potential evolutionary candidates involved in driving clade-specific characters and phenotypes. On this premise, we analyzed whole genome sequences along with gene orthology data retrieved from major DNA databases to find Hominidae-specific (HS) genes and HCNSs. We discovered that Down syndrome critical region 4 (DSCR4) is the only experimentally verified gene uniquely present in Hominidae. DSCR4 has no structural homology to any known protein and was inferred to have emerged in several steps through LTR/ERV1, LTR/ERVL retrotransposition, and transversion. Using the genomic distance as neutral evolution threshold, we identified 1,658 HS HCNSs. Polymorphism coverage and derived allele frequency analysis of HS HCNSs showed that these HCNSs are under purifying selection, indicating that they may harbor important functions. They are overrepresented in promoters/untranslated regions, in close proximity of genes involved in sensory perception of sound and developmental process, and also showed a significantly lower nucleosome occupancy probability. Interestingly, many ancestral sequences of the HS HCNSs showed very high evolutionary rates. This suggests that new functions emerged through some kind of positive selection, and then purifying selection started to operate to keep these functions. PMID:27289096

  2. Amino acid sequence heterogeneity of the chromosomal encoded Borrelia burgdorferi sensu lato major antigen P100.

    PubMed

    Fellinger, W; Farencena, A; Redl, B; Sambri, V; Cevenini, R; Stöffler, G

    1995-04-01

    The entire nucleotide sequence of the chromosomal encoded major antigen p100 of the European Borrelia garinii isolate B29 was determined and the deduced amino acid sequence was compared to the homologous antigen p83 of the North American Borrelia burgdorferi sensu stricto strain B31 and the p100 of the European Borrelia afzelii (group VS461) strain PKo. p100 of strain B29 shows 87% amino acid sequence identity to strain B31 and 79.2% to strain PKo, p100 of strain B31 and PKo shows 62.5% identity to each other. In addition, partial nucleotide sequences of the most heterogeneous region of the p100 gene of two other Borrelia garinii isolates (PBi and VS286) have been determined and the deduced amino acid sequences were compared with all p100 of Borrelia garinii published so far. We found an amino acid sequence identity between 88.6 and 100% within the same genospecies. The N-terminal part of the p100 proteins is highly conserved whereas a striking heterogeneous region within the C-terminal part of the proteins was observed.

  3. Complete Coding Sequences of One H9 and Three H7 Low-Pathogenic Influenza Viruses Circulating in Wild Birds in Belgium, 2009 to 2012

    PubMed Central

    Rosseel, Toon; Marché, Sylvie; Steensels, Mieke; Vangeluwe, Didier; Linden, Annick; van den Berg, Thierry; Lambrecht, Bénédicte

    2016-01-01

    The complete coding sequences of four avian influenza A viruses (two H7N7, one H7N1, and one H9N2) circulating in wild waterfowl in Belgium from 2009 to 2012 were determined using Illumina sequencing. All viral genome segments represent viruses circulating in the Eurasian wild bird population. PMID:27284153

  4. Coding Variants at Hexa-allelic Amino Acid 13 of HLA-DRB1 Explain Independent SNP Associations with Follicular Lymphoma Risk

    PubMed Central

    Foo, Jia Nee; Smedby, Karin E.; Akers, Nicholas K.; Berglund, Mattias; Irwan, Ishak D.; Jia, Xiaoming; Li, Yi; Conde, Lucia; Darabi, Hatef; Bracci, Paige M.; Melbye, Mads; Adami, Hans-Olov; Glimelius, Bengt; Khor, Chiea Chuen; Hjalgrim, Henrik; Padyukov, Leonid; Humphreys, Keith; Enblad, Gunilla; Skibola, Christine F.; de Bakker, Paul I.W.; Liu, Jianjun

    2013-01-01

    Non-Hodgkin lymphoma represents a diverse group of blood malignancies, of which follicular lymphoma (FL) is a common subtype. Previous genome-wide association studies (GWASs) have identified in the human leukocyte antigen (HLA) class II region multiple independent SNPs that are significantly associated with FL risk. To dissect these signals and determine whether coding variants in HLA genes are responsible for the associations, we conducted imputation, HLA typing, and sequencing in three independent populations for a total of 689 cases and 2,446 controls. We identified a hexa-allelic amino acid polymorphism at position 13 of the HLA-DR beta chain that showed the strongest association with FL within the major histocompatibility complex (MHC) region (multiallelic p = 2.3 × 10−15). Out of six possible amino acids that occurred at that position within the population, we classified two as high risk (Tyr and Phe), two as low risk (Ser and Arg), and two as moderate risk (His and Gly). There was a 4.2-fold difference in risk (95% confidence interval = 2.9–6.1) between subjects carrying two alleles encoding high-risk amino acids and those carrying two alleles encoding low-risk amino acids (p = 1.01 × 10−14). This coding variant might explain the complex SNP associations identified by GWASs and suggests a common HLA-DR antigen-driven mechanism for the pathogenesis of FL and rheumatoid arthritis. PMID:23791106

  5. Coding variants at hexa-allelic amino acid 13 of HLA-DRB1 explain independent SNP associations with follicular lymphoma risk.

    PubMed

    Foo, Jia Nee; Smedby, Karin E; Akers, Nicholas K; Berglund, Mattias; Irwan, Ishak D; Jia, Xiaoming; Li, Yi; Conde, Lucia; Darabi, Hatef; Bracci, Paige M; Melbye, Mads; Adami, Hans-Olov; Glimelius, Bengt; Khor, Chiea Chuen; Hjalgrim, Henrik; Padyukov, Leonid; Humphreys, Keith; Enblad, Gunilla; Skibola, Christine F; de Bakker, Paul I W; Liu, Jianjun

    2013-07-11

    Non-Hodgkin lymphoma represents a diverse group of blood malignancies, of which follicular lymphoma (FL) is a common subtype. Previous genome-wide association studies (GWASs) have identified in the human leukocyte antigen (HLA) class II region multiple independent SNPs that are significantly associated with FL risk. To dissect these signals and determine whether coding variants in HLA genes are responsible for the associations, we conducted imputation, HLA typing, and sequencing in three independent populations for a total of 689 cases and 2,446 controls. We identified a hexa-allelic amino acid polymorphism at position 13 of the HLA-DR beta chain that showed the strongest association with FL within the major histocompatibility complex (MHC) region (multiallelic p = 2.3 × 10⁻¹⁵). Out of six possible amino acids that occurred at that position within the population, we classified two as high risk (Tyr and Phe), two as low risk (Ser and Arg), and two as moderate risk (His and Gly). There was a 4.2-fold difference in risk (95% confidence interval = 2.9-6.1) between subjects carrying two alleles encoding high-risk amino acids and those carrying two alleles encoding low-risk amino acids (p = 1.01 × 10⁻¹⁴). This coding variant might explain the complex SNP associations identified by GWASs and suggests a common HLA-DR antigen-driven mechanism for the pathogenesis of FL and rheumatoid arthritis.

  6. ANATOMICAL MNEMONICS OF THE GENETIC CODE: A FUNCTIONAL ICOSAHEDRON AND THE VIGESIMAL SYSTEM OF THE MAYA TO REPRESENT THE TWENTY PROTEINOGENIC AMINO ACIDS

    PubMed Central

    CASTRO-CHAVEZ, FERNANDO

    2016-01-01

    In programming and bioinformatics, the graphical interface is vital to describe and to abbreviate aspects and concepts of the physical world. The Mayan Culture developed the vigesimal system, a numerical system based on their count of fingers and toes. My objective is to equate the Mayan system and their numerical representation to the twenty amino acids according to size, except for the number one, represented by a dot, that here is given to cysteine, which acts as glue among peptides as one of its properties; in such a way, two vertical dots will be easily used to represent its related selenocysteine. The Mayan numerical system included the zero, represented by the Maya with an empty shell that here is used to represent the stop codons. On the other hand, the Chinese had a binary numerical system, similar to the binary comparisons of the three properties of Nucleotides within the double helix: H-Bonds, C-Rings and Tautomerism, called the I Ching which here is applied to the natural groups of amino acids that result of the 64-codons compared in binary in their H-Bonds versus their C-Rings, used here to successfully represent the mature sequence of the glucagon amino acids. Additional anatomical tools for the mnemonics of the genetic code and of its amino acid groups are also presented, as well as a functional icosahedron to represent them. Concluding, tools are presented for the visual analysis of proteins and peptide sequencing in bioinformatics and education to teach the genetic code and its resulting amino acids, plus their numerical systems. PMID:27081676

  7. Detection and isolation of nucleic acid sequences using competitive hybridization probes

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    1997-01-01

    A method for detecting a target nucleic acid sequence in a sample is provided using hybridization probes which competitively hybridize to a target nucleic acid. According to the method, a target nucleic acid sequence is hybridized to first and second hybridization probes which are complementary to overlapping portions of the target nucleic acid sequence, the first hybridization probe including a first complexing agent capable of forming a binding pair with a second complexing agent and the second hybridization probe including a detectable marker. The first complexing agent attached to the first hybridization probe is contacted with a second complexing agent, the second complexing agent being attached to a solid support such that when the first and second complexing agents are attached, target nucleic acid sequences hybridized to the first hybridization probe become immobilized on to the solid support. The immobilized target nucleic acids are then separated and detected by detecting the detectable marker attached to the second hybridization probe. A kit for performing the method is also provided.

  8. Detection and isolation of nucleic acid sequences using competitive hybridization probes

    DOEpatents

    Lucas, J.N.; Straume, T.; Bogen, K.T.

    1997-04-01

    A method for detecting a target nucleic acid sequence in a sample is provided using hybridization probes which competitively hybridize to a target nucleic acid. According to the method, a target nucleic acid sequence is hybridized to first and second hybridization probes which are complementary to overlapping portions of the target nucleic acid sequence, the first hybridization probe including a first complexing agent capable of forming a binding pair with a second complexing agent and the second hybridization probe including a detectable marker. The first complexing agent attached to the first hybridization probe is contacted with a second complexing agent, the second complexing agent being attached to a solid support such that when the first and second complexing agents are attached, target nucleic acid sequences hybridized to the first hybridization probe become immobilized on to the solid support. The immobilized target nucleic acids are then separated and detected by detecting the detectable marker attached to the second hybridization probe. A kit for performing the method is also provided. 7 figs.

  9. Nucleotide sequences of three distinct clones coding for rat heavy chain class 1 major hitocompatibility antigens

    SciTech Connect

    Wang, M.; Stepkowski, S.M.; Tain, L.

    1996-09-01

    Poly(A){sup +} RNAs were isolated from ConconavalinA stimulated splenocytes of BUF (RT1.A{sup b}), PVG (RT1.A{sup c}), or PVG.1U (RT1.A{sup u}) rats, respectively, using a Micro-Fast Track kit. After reverse transcription with a synthetic oligo-d(T) primer (5{sup {prime}}-CAT GAT CGA ATT CAC GCG TCT AGA TTT TTT TTT TTT TTT TTT TTT TTT TVN-3{sup {prime}}, V = A+G+C, N = A+T+G+C; Genosys, Woodland, TX), 1.6 kilobase products, which encode the entire MHC class I protein and the 3{sup {prime}} non-translated region including the poly-A tail, were amplified by polymerase chain reaction (PCR) using two synthetic oligonucleotide primers (Genosys). The upstream primer (5{sup {prime}}-GTC CGG GWT CTC AGA TGG GG C-3{sup {prime}}, W = A+T) was designed based upon the published rat class I sequences of eight genes: RT1.1{sup a} M31018; rat LW2 gene X70066; RT1.1{sup 1}, L26224 X79719; RT1.A{sup u} X82669, and RT1.Aw3 L40363, RT1.E{sup u} L40365, RT1.C{sup 1} L40362. The downstream primer (5{sup {prime}}) ATG ATC GAA TTC ACG CGT CTA GA-3{sup {prime}} was the portion of the oligo-d(T) primer used for reverse transcription. The purified PCR products were inserted into pCR II cloning vectors (Invitrogen). Automated sequencing of plasmid cDNAs from the positive clones obtained from three repeated PCR amplifications identified by restriction enzyme mapping were reproducible. Comparison between new sequences of the heavy chain class I genes and those available in GenBank. 7 refs., 1 fig.

  10. Possible antiviral effect of ciprofloxacin treatment on polyomavirus BK replication and analysis of non-coding control region sequences

    PubMed Central

    2013-01-01

    Acute renal dysfunction (ARD) is a common complication in renal transplant recipients. Multiple factors contribute to ARD development, including acute rejection and microbial infections. Many viral infections after kidney transplantation result from reactivation of “latent” viruses in the host or from the graft, such as the human Polyomavirus BK (BKV). We report the case of a 39 year-old recipient of a 2nd kidney graft who experienced BKV reactivation after a second episode of acute humoral rejection. A 10-day treatment with the quinolone antibiotic ciprofloxacin was administered with an increase of immunosuppressive therapy despite the active BKV replication. Real Time PCR analysis performed after treatment with ciprofloxacin, unexpectedly showed clearance of BK viremia and regression of BK viruria. During the follow-up, BK viremia persisted undetectable while viruria decreased further and disappeared after 3 months. BKV non-coding control region sequence analysis from all positive samples always showed the presence of archetypal sequences, with two single-nucleotide substitutions and one nucleotide deletion that, interestingly, were all representative of the subtype/subgroup I/b-1 we identified by the viral protein 1 sequencing analysis. We report the potential effect of the quinolone antibiotic ciprofloxacin in the decrease of the BKV load in both blood and urine. PMID:24004724

  11. Possible antiviral effect of ciprofloxacin treatment on polyomavirus BK replication and analysis of non-coding control region sequences.

    PubMed

    Umbro, Ilaria; Anzivino, Elena; Tinti, Francesca; Zavatto, Assunta; Bellizzi, Anna; Rodio, Donatella Maria; Mancini, Carlo; Pietropaolo, Valeria; Mitterhofer, Anna Paola

    2013-01-01

    Acute renal dysfunction (ARD) is a common complication in renal transplant recipients. Multiple factors contribute to ARD development, including acute rejection and microbial infections. Many viral infections after kidney transplantation result from reactivation of "latent" viruses in the host or from the graft, such as the human Polyomavirus BK (BKV). We report the case of a 39 year-old recipient of a 2nd kidney graft who experienced BKV reactivation after a second episode of acute humoral rejection. A 10-day treatment with the quinolone antibiotic ciprofloxacin was administered with an increase of immunosuppressive therapy despite the active BKV replication. Real Time PCR analysis performed after treatment with ciprofloxacin, unexpectedly showed clearance of BK viremia and regression of BK viruria. During the follow-up, BK viremia persisted undetectable while viruria decreased further and disappeared after 3 months.BKV non-coding control region sequence analysis from all positive samples always showed the presence of archetypal sequences, with two single-nucleotide substitutions and one nucleotide deletion that, interestingly, were all representative of the subtype/subgroup I/b-1 we identified by the viral protein 1 sequencing analysis.We report the potential effect of the quinolone antibiotic ciprofloxacin in the decrease of the BKV load in both blood and urine.

  12. Possible antiviral effect of ciprofloxacin treatment on polyomavirus BK replication and analysis of non-coding control region sequences.

    PubMed

    Umbro, Ilaria; Anzivino, Elena; Tinti, Francesca; Zavatto, Assunta; Bellizzi, Anna; Rodio, Donatella Maria; Mancini, Carlo; Pietropaolo, Valeria; Mitterhofer, Anna Paola

    2013-01-01

    Acute renal dysfunction (ARD) is a common complication in renal transplant recipients. Multiple factors contribute to ARD development, including acute rejection and microbial infections. Many viral infections after kidney transplantation result from reactivation of "latent" viruses in the host or from the graft, such as the human Polyomavirus BK (BKV). We report the case of a 39 year-old recipient of a 2nd kidney graft who experienced BKV reactivation after a second episode of acute humoral rejection. A 10-day treatment with the quinolone antibiotic ciprofloxacin was administered with an increase of immunosuppressive therapy despite the active BKV replication. Real Time PCR analysis performed after treatment with ciprofloxacin, unexpectedly showed clearance of BK viremia and regression of BK viruria. During the follow-up, BK viremia persisted undetectable while viruria decreased further and disappeared after 3 months.BKV non-coding control region sequence analysis from all positive samples always showed the presence of archetypal sequences, with two single-nucleotide substitutions and one nucleotide deletion that, interestingly, were all representative of the subtype/subgroup I/b-1 we identified by the viral protein 1 sequencing analysis.We report the potential effect of the quinolone antibiotic ciprofloxacin in the decrease of the BKV load in both blood and urine. PMID:24004724

  13. Identification of internal transcribed spacer sequence motifs in truffles: a first step toward their DNA bar coding.

    PubMed

    El Karkouri, Khalid; Murat, Claude; Zampieri, Elisa; Bonfante, Paola

    2007-08-01

    This work presents DNA sequence motifs from the internal transcribed spacer (ITS) of the nuclear rRNA repeat unit which are useful for the identification of five European and Asiatic truffles (Tuber magnatum, T. melanosporum, T. indicum, T. aestivum, and T. mesentericum). Truffles are edible mycorrhizal ascomycetes that show similar morphological characteristics but that have distinct organoleptic and economic values. A total of 36 out of 46 ITS1 or ITS2 sequence motifs have allowed an accurate in silico distinction of the five truffles to be made (i.e., by pattern matching and/or BLAST analysis on downloaded GenBank sequences and directly against GenBank databases). The motifs considered the intraspecific genetic variability of each species, including rare haplotypes, and assigned their respective species from either the ascocarps or ectomycorrhizas. The data indicate that short ITS1 or ITS2 motifs (< or = 50 bp in size) can be considered promising tools for truffle species identification. A dot blot hybridization analysis of T. magnatum and T. melanosporum compared with other close relatives or distant lineages allowed at least one highly specific motif to be identified for each species. These results were confirmed in a blind test which included new field isolates. The current work has provided a reliable new tool for a truffle oligonucleotide bar code and identification in ecological and evolutionary studies. PMID:17601808

  14. Indole-3-acetic acid: A widespread physiological code in interactions of fungi with other organisms

    PubMed Central

    Fu, Shih-Feng; Wei, Jyuan-Yu; Chen, Hung-Wei; Liu, Yen-Yu; Lu, Hsueh-Yu; Chou, Jui-Yu

    2015-01-01

    Plants as well as microorganisms, including bacteria and fungi, produce indole-3-acetic acid (IAA). IAA is the most common plant hormone of the auxin class and it regulates various aspects of plant growth and development. Thus, research is underway globally to exploit the potential for developing IAA-producing fungi for promoting plant growth and protection for sustainable agriculture. Phylogenetic evidence suggests that IAA biosynthesis evolved independently in bacteria, microalgae, fungi, and plants. Present studies show that IAA regulates the physiological response and gene expression in these microorganisms. The convergent evolution of IAA production leads to the hypothesis that natural selection might have favored IAA as a widespread physiological code in these microorganisms and their interactions. We summarize recent studies of IAA biosynthetic pathways and discuss the role of IAA in fungal ecology. PMID:26179718

  15. Identification of small non-coding RNAs in the planarian Dugesia japonica via deep sequencing.

    PubMed

    Qin, Yun-Fei; Zhao, Jin-Mei; Bao, Zhen-Xia; Zhu, Zhao-Yu; Mai, Jia; Huang, Yi-Bo; Li, Jian-Biao; Chen, Ge; Lu, Ping; Chen, San-Jun; Su, Lin-Lin; Fang, Hui-Min; Lu, Ji-Ke; Zhang, Yi-Zhe; Zhang, Shou-Tao

    2012-05-01

    Freshwater planarian flatworm possesses an extraordinary ability to regenerate lost body parts after amputation; it is perfect organism model in regeneration and stem cell biology. Recently, small RNAs have been an increasing concern and studied in many aspects, including regeneration and stem cell biology, among others. In the current study, the large-scale cloning and sequencing of sRNAs from the intact and regenerative planarian Dugesia japonica are reported. Sequence analysis shows that sRNAs between 18nt and 40nt are mainly microRNAs and piRNAs. In addition, 209 conserved miRNAs and 12 novel miRNAs are identified. Especially, a better screening target method, negative-correlation relationship of miRNAs and mRNA, is adopted to improve target prediction accuracy. Similar to miRNAs, a diverse population of piRNAs and changes in the two samples are also listed. The present study is the first to report on the important role of sRNAs during planarian Dugesia japonica regeneration. PMID:22425900

  16. Ligation with nucleic acid sequence-based amplification.

    PubMed

    Ong, Carmichael; Tai, Warren; Sarma, Aartik; Opal, Steven M; Artenstein, Andrew W; Tripathi, Anubhav

    2012-01-01

    This work presents a novel method for detecting nucleic acid targets using a ligation step along with an isothermal, exponential amplification step. We use an engineered ssDNA with two variable regions on the ends, allowing us to design the probe for optimal reaction kinetics and primer binding. This two-part probe is ligated by T4 DNA Ligase only when both parts bind adjacently to the target. The assay demonstrates that the expected 72-nt RNA product appears only when the synthetic target, T4 ligase, and both probe fragments are present during the ligation step. An extraneous 38-nt RNA product also appears due to linear amplification of unligated probe (P3), but its presence does not cause a false-positive result. In addition, 40 mmol/L KCl in the final amplification mix was found to be optimal. It was also found that increasing P5 in excess of P3 helped with ligation and reduced the extraneous 38-nt RNA product. The assay was also tested with a single nucleotide polymorphism target, changing one base at the ligation site. The assay was able to yield a negative signal despite only a single-base change. Finally, using P3 and P5 with longer binding sites results in increased overall sensitivity of the reaction, showing that increasing ligation efficiency can improve the assay overall. We believe that this method can be used effectively for a number of diagnostic assays. PMID:22449695

  17. The amino acid sequence of mitogenic lectin-B from the roots of pokeweed (Phytolacca americana).

    PubMed

    Yamaguchi, K; Yurino, N; Kino, M; Ishiguro, M; Funatsu, G

    1997-04-01

    The complete amino acid sequence of pokeweed lectin-B (PL-B) has been analyzed by first sequencing seven lysylendopeptidase peptides derived from the reduced and S-pyridylethylated PL-B and then connecting them by analyzing the arginylendopeptidase peptides from the reduced and S-carboxymethylated PL-B. PL-B consists of 295 amino acid residues and two oligosaccharides linked to Asn96 and Asn139, and has a molecular mass of 34,493 Da. PL-B is composed of seven repetitive chitin-binding domains having 48-79% sequence homology with each other. Twelve amino acid residues including eight cysteine residues in these domains are absolutely conserved in all other chitin-binding domains of plant lectins and class I chitinases. Also, it was strongly suggested that the extremely high hemagglutinating and mitogenic activities of PL-B may be ascribed to its seven-domain structure.

  18. ConSurf 2010: calculating evolutionary conservation in sequence and structure of proteins and nucleic acids.

    PubMed

    Ashkenazy, Haim; Erez, Elana; Martz, Eric; Pupko, Tal; Ben-Tal, Nir

    2010-07-01

    It is informative to detect highly conserved positions in proteins and nucleic acid sequence/structure since they are often indicative of structural and/or functional importance. ConSurf (http://consurf.tau.ac.il) and ConSeq (http://conseq.tau.ac.il) are two well-established web servers for calculating the evolutionary conservation of amino acid positions in proteins using an empirical Bayesian inference, starting from protein structure and sequence, respectively. Here, we present the new version of the ConSurf web server that combines the two independent servers, providing an easier and more intuitive step-by-step interface, while offering the user more flexibility during the process. In addition, the new version of ConSurf calculates the evolutionary rates for nucleic acid sequences. The new version is freely available at: http://consurf.tau.ac.il/.

  19. RAD6/sup +/ gene of Saccharomyces cerevisiae codes for two mutationally separable deoxyribonucleic acid repair functions

    SciTech Connect

    Tuite, M.F.; Cox, B.S.

    1981-02-01

    The response of two mutant alleles of the RAD6/sup +/ gene of Saccharomyces cerevisiae to the ochre translational suppressor SUQ5 was determined. Both the ultraviolet sensitivity phenotype and the deficiency in ultraviolet-induced mutagenesis phenotype of the rad6-1 allelle were suppressed in a (psi/sup +/) background. For the rad6-3 allelle, only the ultraviolet-sensitivity phenotype was suppressible in a (psi/sup +/) background. An SUQ5 rad6-3 (psi/sup +/) strain that was examined showed the normal rad6-3 deficiency in ultraviolet-induced mutagenesis. The authors propose that the RAD6/sup +/ gene is divided into two cistrons, RAD6A and RAD6B. RAD6A codes for an activity responsible for the error-prone repair of ultraviolet-induced lesions in deoxyribonucleic acid but is not involved in a cell's resistance to the lethal effects of ultraviolet light. RAD6B codes for an activity essential for error-free repair of potentially lethal mutagenic damage.

  20. Shark myoglobins. II. Isolation, characterization and amino acid sequence of myoglobin from Galeorhinus japonicus.

    PubMed

    Suzuki, T; Suzuki, T; Yata, T

    1985-01-01

    Native oxymyoglobin (MbO2) was isolated from red muscle of G. japonicus by chromatographic separation from metmyoglobin (metMb) on DEAE-cellulose and the amino acid sequence of the major chain was determined with the aid of sequence homology with that of G. australis. It was shown to differ in amino acid sequence from that of G. australis by 10 replacements, to be acetylated at the amino terminus and to contain glutamine at the distal (E7) residue. It was also shown to have a spectrum very similar to that of mammalian MbO2. However, the pH-dependence for the autoxidation of MbO2 was seen to be quite different from that of sperm whale (Physeter catodon) MbO2. Although the sequence homology between sperm whale and G. japonicus myoglobins is about 40%, their hydropathy profiles were very similar, indicating that they have a similar geometry in their globin folding.

  1. Using Triple Helix Forming Peptide Nucleic Acids for Sequence-selective Recognition of Double-stranded RNA

    PubMed Central

    Hnedzko, Dziyana; Cheruiyot, Samwel K.; Rozners, Eriks

    2014-01-01

    Non-coding RNAs play important roles in regulation of gene expression. Specific recognition and inhibition of these biologically important RNAs that form complex double-helical structures will be highly useful for fundamental studies in biology and practical applications in medicine. This protocol describes a strategy developed in our laboratory for sequence-selective recognition of double-stranded RNA (dsRNA) using triple helix forming peptide nucleic acids (PNAs) that bind in the major grove of RNA helix. The strategy developed uses chemically modified nucleobases, such as 2-aminopyridine (M) that enables strong triple helical binding at physiologically relevant conditions, and 2-pyrimidinone (P) and 3-oxo-2,3-dihydropyridazine (E) that enable recognition of isolated pyrimidines in the purine rich strand of the RNA duplex. Detailed protocols for preparation of modified PNA monomers, solid-phase synthesis and HPLC purification of PNA oligomers, and measuring dsRNA binding affinity using isothermal titration calorimetry are included. PMID:25199637

  2. Genome sequence of the acid-tolerant Burkholderia sp. strain WSM2232 from Karijini National Park, Australia

    PubMed Central

    Walker, Robert; Watkin, Elizabeth; Tian, Rui; Bräu, Lambert; O’Hara, Graham; Goodwin, Lynne; Han, James; Reddy, Tatiparthi; Huntemann, Marcel; Pati, Amrita; Woyke, Tanja; Mavromatis, Konstantinos; Markowitz, Victor; Ivanova, Natalia; Kyrpides, Nikos; Reeve, Wayne

    2013-01-01

    Burkholderia sp. strain WSM2232 is an aerobic, motile, Gram-negative, non-spore-forming acid-tolerant rod that was trapped in 2001 from acidic soil collected from Karijini National Park (Australia) using Gastrolobium capitatum as a host. WSM2232 was effective in nitrogen fixation with G. capitatum but subsequently lost symbiotic competence during long-term storage. Here we describe the features of Burkholderia sp. strain WSM2232, together with genome sequence information and its annotation. The 7,208,311 bp standard-draft genome is arranged into 72 scaffolds of 72 contigs containing 6,322 protein-coding genes and 61 RNA-only encoding genes. The loss of symbiotic capability can now be attributed to the loss of nodulation and nitrogen fixation genes from the genome. This rhizobial genome is one of 100 sequenced as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Root Nodule Bacteria (GEBA-RNB) project. PMID:25197442

  3. Visible sensing of nucleic acid sequences using a genetically encodable unmodified mRNA probe.

    PubMed

    Narita, Atsushi; Ogawa, Kazumasa; Sando, Shinsuke; Aoyama, Yasuhiro

    2006-01-01

    We previously reported a molecular beacon-mRNA (MB-mRNA) strategy for nucleic acid detection/sensing in a cell-free translation system using unmodified RNA as a probe. Here in this presentation, we report that a combination with RNase H activity, which induces an additional process of irreversible cleavage of MB-domain, achieves an improved sequence selectivity (one nucleotide selectivity) and an enhanced sensitivity. This improved system finally enabled visible sensing of target nucleic acid sequence at a single nucleotide resolution under isothermal conditions.

  4. Amino acid and cDNA sequences of lysozyme from Hyalophora cecropia

    PubMed Central

    Engström, Å.; Xanthopoulos, K. G.; Boman, H. G.; Bennich, H.

    1985-01-01

    The amino acid and cDNA sequences of lysozyme from the giant silk moth Hyalophora cecropia have been determined. This enzyme is one of several immune proteins produced by the diapausing pupae after injection of bacteria. Cecropia lysozyme is composed of 120 amino acids, has a mol. wt. of 13.8 kd and shows great similarity with vertebrate lysozymes of the chicken type. The amino acid residues responsible for the catalytic activity and for the binding of substrate are essentially conserved. Three allelic variants of the Cecropia enzyme are identified. A comparison of the chicken and the Cecropia lysozymes shows that there is a 40% identity at both the amino acid and the nucleotide level. Some evolutionary aspects of the sequence data are discussed. PMID:16453632

  5. Color differences among feral pigeons (Columba livia) are not attributable to sequence variation in the coding region of the melanocortin-1 receptor gene (MC1R)

    PubMed Central

    2013-01-01

    Background Genetic variation at the melanocortin-1 receptor (MC1R) gene is correlated with melanin color variation in many birds. Feral pigeons (Columba livia) show two major melanin-based colorations: a red coloration due to pheomelanic pigment and a black coloration due to eumelanic pigment. Furthermore, within each color type, feral pigeons display continuous variation in the amount of melanin pigment present in the feathers, with individuals varying from pure white to a full dark melanic color. Coloration is highly heritable and it has been suggested that it is under natural or sexual selection, or both. Our objective was to investigate whether MC1R allelic variants are associated with plumage color in feral pigeons. Findings We sequenced 888 bp of the coding sequence of MC1R among pigeons varying both in the type, eumelanin or pheomelanin, and the amount of melanin in their feathers. We detected 10 non-synonymous substitutions and 2 synonymous substitution but none of them were associated with a plumage type. It remains possible that non-synonymous substitutions that influence coloration are present in the short MC1R fragment that we did not sequence but this seems unlikely because we analyzed the entire functionally important region of the gene. Conclusions Our results show that color differences among feral pigeons are probably not attributable to amino acid variation at the MC1R locus. Therefore, variation in regulatory regions of MC1R or variation in other genes may be responsible for the color polymorphism of feral pigeons. PMID:23915680

  6. RNA-sequencing reveals previously unannotated protein- and microRNA-coding genes expressed in aleurone cells of rice seeds.

    PubMed

    Watanabe, Kenneth A; Ringler, Patricia; Gu, Lingkun; Shen, Qingxi J

    2014-01-01

    The rice genome annotation has been greatly improved in recent years, largely due to the availability of full length cDNA sequences derived from many tissues. Among those yet to be studied is the aleurone layer, which produces hydrolases for mobilization of seed storage reserves during seed germination and post germination growth. Herein, we report transcriptomes of aleurone cells treated with the hormones abscisic acid, gibberellic acid, or both. Using a comprehensive approach, we identified hundreds of novel genes. To minimize the number of false positives, only transcripts that did not overlap with existing annotations, had a high level of expression, and showed a high level of uniqueness within the rice genome were considered to be novel genes. This approach led to the identification of 553 novel genes that encode proteins and/or microRNAs. The transcriptome data reported here will help to further improve the annotation of the rice genome.

  7. Combining protein identification and quantification: C-terminal isotope-coded tagging using sulfanilic acid.

    PubMed

    Panchaud, Alexandre; Guillaume, Elisabeth; Affolter, Michael; Robert, Fabien; Moreillon, Philippe; Kussmann, Martin

    2006-01-01

    Two methods of differential isotopic coding of carboxylic groups have been developed to date. The first approach uses d0- or d3-methanol to convert carboxyl groups into the corresponding methyl esters. The second relies on the incorporation of two 18O atoms into the C-terminal carboxylic group during tryptic digestion of proteins in H(2)18O. However, both methods have limitations such as chromatographic separation of 1H and 2H derivatives or overlap of isotopic distributions of light and heavy forms due to small mass shifts. Here we present a new tagging approach based on the specific incorporation of sulfanilic acid into carboxylic groups. The reagent was synthesized in a heavy form (13C phenyl ring), showing no chromatographic shift and an optimal isotopic separation with a 6 Da mass shift. Moreover, sulfanilic acid allows for simplified fragmentation in matrix-assisted laser desorption/ionization (MALDI) due the charge fixation of the sulfonate group at the C-terminus of the peptide. The derivatization is simple, specific and minimizes the number of sample treatment steps that can strongly alter the sample composition. The quantification is reproducible within an order of magnitude and can be analyzed either by electrospray ionization (ESI) or MALDI. Finally, the method is able to specifically identify the C-terminal peptide of a protein by using GluC as the proteolytic enzyme.

  8. Most Used Codons per Amino Acid and per Genome in the Code of Man Compared to Other Organisms According to the Rotating Circular Genetic Code

    PubMed Central

    Castro-Chavez, Fernando

    2011-01-01

    My previous theoretical research shows that the rotating circular genetic code is a viable tool to make easier to distinguish the rules of variation applied to the amino acid exchange; it presents a precise and positional bio-mathematical balance of codons, according to the amino acids they codify. Here, I demonstrate that when using the conventional or classic circular genetic code, a clearer pattern for the human codon usage per amino acid and per genome emerges. The most used human codons per amino acid were the ones ending with the three hydrogen bond nucleotides: C for 12 amino acids and G for the remaining 8, plus one codon for arginine ending in A that was used approximately with the same frequency than the one ending in G for this same amino acid (plus *). The most used codons in man fall almost all the time at the rightmost position, clockwise, ending either in C or in G within the circular genetic code. The human codon usage per genome is compared to other organisms such as fruit flies (Drosophila melanogaster), squid (Loligo pealei), and many others. The biosemiotic codon usage of each genomic population or ‘Theme’ is equated to a ‘molecular language’. The C/U choice or difference, and the G/A difference in the third nucleotide of the most used codons per amino acid are illustrated by comparing the most used codons per genome in humans and squids. The human distribution in the third position of most used codons is a 12-8-2, C-G-A, nucleotide ending signature, while the squid distribution in the third position of most used codons was an odd, or uneven, distribution in the third position of its most used codons: 13-6-3, U-A-G, as its nucleotide ending signature. These findings may help to design computational tools to compare human genomes, to determine the exchangeability between compatible codons and amino acids, and for the early detection of incompatible changes leading to hereditary diseases. PMID:22997484

  9. Deconstruction of archaeal genome depict strategic consensus in core pathways coding sequence assembly.

    PubMed

    Pal, Ayon; Banerjee, Rachana; Mondal, Uttam K; Mukhopadhyay, Subhasis; Bothra, Asim K

    2015-01-01

    A comprehensive in silico analysis of 71 species representing the different taxonomic classes and physiological genre of the domain Archaea was performed. These organisms differed in their physiological attributes, particularly oxygen tolerance and energy metabolism. We explored the diversity and similarity in the codon usage pattern in the genes and genomes of these organisms, emphasizing on their core cellular pathways. Our thrust was to figure out whether there is any underlying similarity in the design of core pathways within these organisms. Analyses of codon utilization pattern, construction of hierarchical linear models of codon usage, expression pattern and codon pair preference pointed to the fact that, in the archaea there is a trend towards biased use of synonymous codons in the core cellular pathways and the Nc-plots appeared to display the physiological variations present within the different species. Our analyses revealed that aerobic species of archaea possessed a larger degree of freedom in regulating expression levels than could be accounted for by codon usage bias alone. This feature might be a consequence of their enhanced metabolic activities as a result of their adaptation to the relatively O2-rich environment. Species of archaea, which are related from the taxonomical viewpoint, were found to have striking similarities in their ORF structuring pattern. In the anaerobic species of archaea, codon bias was found to be a major determinant of gene expression. We have also detected a significant difference in the codon pair usage pattern between the whole genome and the genes related to vital cellular pathways, and it was not only species-specific but pathway specific too. This hints towards the structuring of ORFs with better decoding accuracy during translation. Finally, a codon-pathway interaction in shaping the codon design of pathways was observed where the transcription pathway exhibited a significantly different coding frequency signature.

  10. The Number, Organization, and Size of Polymorphic Membrane Protein Coding Sequences as well as the Most Conserved Pmp Protein Differ within and across Chlamydia Species.

    PubMed

    Van Lent, Sarah; Creasy, Heather Huot; Myers, Garry S A; Vanrompay, Daisy

    2016-01-01

    Variation is a central trait of the polymorphic membrane protein (Pmp) family. The number of pmp coding sequences differs between Chlamydia species, but it is unknown whether the number of pmp coding sequences is constant within a Chlamydia species. The level of conservation of the Pmp proteins has previously only been determined for Chlamydia trachomatis. As different Pmp proteins might be indispensible for the pathogenesis of different Chlamydia species, this study investigated the conservation of Pmp proteins both within and across C. trachomatis,C. pneumoniae,C. abortus, and C. psittaci. The pmp coding sequences were annotated in 16 C. trachomatis, 6 C. pneumoniae, 2 C. abortus, and 16 C. psittaci genomes. The number and organization of polymorphic membrane coding sequences differed within and across the analyzed Chlamydia species. The length of coding sequences of pmpA,pmpB, and pmpH was conserved among all analyzed genomes, while the length of pmpE/F and pmpG, and remarkably also of the subtype pmpD, differed among the analyzed genomes. PmpD, PmpA, PmpH, and PmpA were the most conserved Pmp in C. trachomatis,C. pneumoniae,C. abortus, and C. psittaci, respectively. PmpB was the most conserved Pmp across the 4 analyzed Chlamydia species.

  11. Structural sequences are conserved in the genes coding for the alpha, alpha' and beta-subunits of the soybean 7S seed storage protein.

    PubMed Central

    Schuler, M A; Ladin, B F; Pollaco, J C; Freyer, G; Beachy, R N

    1982-01-01

    Cloned DNAs encoding four different proteins have been isolated from recombinant cDNA libraries constructed with Glycine max seed mRNAs. Two cloned DNAs code for the alpha and alpha'-subunits of the 7S seed storage protein (conglycinin). The other cloned cDNAs code for proteins which are synthesized in vitro as 68,000 d., 60,000 d. or 53,000 d. polypeptides. Hybrid selection experiments indicate that, under low stringency hybridization conditions, all four cDNAs hybridize with mRNAs for the alpha and alpha'-subunits and the 68,000 d., 60,000 d. and 53,000 d. in vitro translation products. Within three of the mRNA, there is a conserved sequence of 155 nucleotides which is responsible for this hybridization. The conserved nucleotides in the alpha and alpha'-subunit cDNAs and the 68,000 d. polypeptide cDNAs span both coding and noncoding sequences. The differences in the coding nucleotides outside the conserved region are extensive. This suggests that selective pressure to maintain the 155 conserved nucleotides has been influenced by the structure of the seed mRNA. RNA blot hybridizations demonstrate that mRNA encoding the other major subunit (beta) of the 7S seed storage protein also shares sequence homology with the conserved 155 nucleotide sequence of the alpha and alpha'-subunit mRNAs, but not with other coding sequences. Images PMID:6897678

  12. Consistent levels of A-to-I RNA editing across individuals in coding sequences and non-conserved Alu repeats

    PubMed Central

    2010-01-01

    Background Adenosine to inosine (A-to-I) RNA-editing is an essential post-transcriptional mechanism that occurs in numerous sites in the human transcriptome, mainly within Alu repeats. It has been shown to have consistent levels of editing across individuals in a few targets in the human brain and altered in several human pathologies. However, the variability across human individuals of editing levels in other tissues has not been studied so far. Results Here, we analyzed 32 skin samples, looking at A-to-I editing level in three genes within coding sequences and in the Alu repeats of six different genes. We observed highly consistent editing levels across different individuals as well as across tissues, not only in coding targets but, surprisingly, also in the non evolutionary conserved Alu repeats. Conclusions Our findings suggest that A-to-I RNA-editing of Alu elements is a tightly regulated process and, as such, might have been recruited in the course of primate evolution for post-transcriptional regulatory mechanisms. PMID:21029430

  13. AcalPred: a sequence-based tool for discriminating between acidic and alkaline enzymes.

    PubMed

    Lin, Hao; Chen, Wei; Ding, Hui

    2013-01-01

    The structure and activity of enzymes are influenced by pH value of their surroundings. Although many enzymes work well in the pH range from 6 to 8, some specific enzymes have good efficiencies only in acidic (pH<5) or alkaline (pH>9) solution. Studies have demonstrated that the activities of enzymes correlate with their primary sequences. It is crucial to judge enzyme adaptation to acidic or alkaline environment from its amino acid sequence in molecular mechanism clarification and the design of high efficient enzymes. In this study, we developed a sequence-based method to discriminate acidic enzymes from alkaline enzymes. The analysis of variance was used to choose the optimized discriminating features derived from g-gap dipeptide compositions. And support vector machine was utilized to establish the prediction model. In the rigorous jackknife cross-validation, the overall accuracy of 96.7% was achieved. The method can correctly predict 96.3% acidic and 97.1% alkaline enzymes. Through the comparison between the proposed method and previous methods, it is demonstrated that the proposed method is more accurate. On the basis of this proposed method, we have built an online web-server called AcalPred which can be freely accessed from the website (http://lin.uestc.edu.cn/server/AcalPred). We believe that the AcalPred will become a powerful tool to study enzyme adaptation to acidic or alkaline environment.

  14. AcalPred: A Sequence-Based Tool for Discriminating between Acidic and Alkaline Enzymes

    PubMed Central

    Lin, Hao; Chen, Wei; Ding, Hui

    2013-01-01

    The structure and activity of enzymes are influenced by pH value of their surroundings. Although many enzymes work well in the pH range from 6 to 8, some specific enzymes have good efficiencies only in acidic (pH<5) or alkaline (pH>9) solution. Studies have demonstrated that the activities of enzymes correlate with their primary sequences. It is crucial to judge enzyme adaptation to acidic or alkaline environment from its amino acid sequence in molecular mechanism clarification and the design of high efficient enzymes. In this study, we developed a sequence-based method to discriminate acidic enzymes from alkaline enzymes. The analysis of variance was used to choose the optimized discriminating features derived from g-gap dipeptide compositions. And support vector machine was utilized to establish the prediction model. In the rigorous jackknife cross-validation, the overall accuracy of 96.7% was achieved. The method can correctly predict 96.3% acidic and 97.1% alkaline enzymes. Through the comparison between the proposed method and previous methods, it is demonstrated that the proposed method is more accurate. On the basis of this proposed method, we have built an online web-server called AcalPred which can be freely accessed from the website (http://lin.uestc.edu.cn/server/AcalPred). We believe that the AcalPred will become a powerful tool to study enzyme adaptation to acidic or alkaline environment. PMID:24130738

  15. A microtubule-associated protein in Drosophila melanogaster: identification, characterization, and isolation of coding sequences

    PubMed Central

    1986-01-01

    Microtubules and microtubule-associated proteins (MAPs) have been isolated from cultured cells of Drosophila melanogaster by a taxol- dependent polymerization procedure. The principal MAPs are a group of four polypeptides with similar electrophoretic mobilities corresponding to approximately Mr 205,000 (the 205K MAP). These proteins are resistant to precipitation by boiling. One mouse monoclonal antibody and one polyclonal rabbit antiserum specific for the Mr 205,000 MAP were produced and characterized by immunoblotting and indirect immunofluorescence. Both antibody preparations stain the Mr 205,000 molecules and an Mr 255,000 molecule in immunoblots of Drosophila cell homogenates; the rabbit antiserum also stains an Mr 150,000 triplet. Both preparations stain the microtubules of the mitotic spindle, and the rabbit antiserum stains the cytoplasmic microtubules as well. Experiments using affinity-purified rabbit antiserum demonstrate that it is the Mr 205,000 species that is located in the mitotic apparatus and on cytoplasmic microtubules. A random shear genomic library was produced in the expressing vector lambda gt11 and screened with the rabbit antiserum to isolate the DNA sequences encoding these polypeptides. Several cross-hybridizing clones were recovered, shown to encode antigenic determinants in the Mr 205,000 MAP, and characterized by hybridization to Northern blots of mRNA and Southern blots of genomic DNA. Analysis by in situ hybridization reveals that the gene encoding the 205K MAP is located in polytene region 100EF. PMID:3086324

  16. The non-coding RNA composition of the mitotic chromosome by 5′-tag sequencing

    PubMed Central

    Meng, Yicong; Yi, Xianfu; Li, Xinhui; Hu, Chuansheng; Wang, Ju; Bai, Ling; Czajkowsky, Daniel M.; Shao, Zhifeng

    2016-01-01

    Mitotic chromosomes are one of the most commonly recognized sub-cellular structures in eukaryotic cells. Yet basic information necessary to understand their structure and assembly, such as their composition, is still lacking. Recent proteomic studies have begun to fill this void, identifying hundreds of RNA-binding proteins bound to mitotic chromosomes. However, by contrast, there are only two RNA species (U3 snRNA and rRNA) that are known to be associated with the mitotic chromosome, suggesting that there are many mitotic chromosome-associated RNAs (mCARs) not yet identified. Here, using a targeted protocol based on 5′-tag sequencing to profile the mammalian mCAR population, we report the identification of 1279 mCARs, the majority of which are ncRNAs, including lncRNAs that exhibit greater conservation across 60 vertebrate species than the entire population of lncRNAs. There is also a significant enrichment of snoRNAs and specific SINE RNAs. Finally, ∼40% of the mCARs are presently unannotated, many of which are as abundant as the annotated mCARs, suggesting that there are also many novel ncRNAs in the mCARs. Overall, the mCARs identified here, together with the previous proteomic and genomic data, constitute the first comprehensive catalogue of the molecular composition of the eukaryotic mitotic chromosomes. PMID:27016738

  17. Antibody-specific model of amino acid substitution for immunological inferences from alignments of antibody sequences.

    PubMed

    Mirsky, Alexander; Kazandjian, Linda; Anisimova, Maria

    2015-03-01

    Antibodies are glycoproteins produced by the immune system as a dynamically adaptive line of defense against invading pathogens. Very elegant and specific mutational mechanisms allow B lymphocytes to produce a large and diversified repertoire of antibodies, which is modified and enhanced throughout all adulthood. One of these mechanisms is somatic hypermutation, which stochastically mutates nucleotides in the antibody genes, forming new sequences with different properties and, eventually, higher affinity and selectivity to the pathogenic target. As somatic hypermutation involves fast mutation of antibody sequences, this process can be described using a Markov substitution model of molecular evolution. Here, using large sets of antibody sequences from mice and humans, we infer an empirical amino acid substitution model AB, which is specific to antibody sequences. Compared with existing general amino acid models, we show that the AB model provides significantly better description for the somatic evolution of mice and human antibody sequences, as demonstrated on large next generation sequencing (NGS) antibody data. General amino acid models are reflective of conservation at the protein level due to functional constraints, with most frequent amino acids exchanges taking place between residues with the same or similar physicochemical properties. In contrast, within the variable part of antibody sequences we observed an elevated frequency of exchanges between amino acids with distinct physicochemical properties. This is indicative of a sui generis mutational mechanism, specific to antibody somatic hypermutation. We illustrate this property of antibody sequences by a comparative analysis of the network modularity implied by the AB model and general amino acid substitution models. We recommend using the new model for computational studies of antibody sequence maturation, including inference of alignments and phylogenetic trees describing antibody somatic hypermutation in

  18. The value of short amino acid sequence matches for prediction of protein allergenicity.

    PubMed

    Silvanovich, Andre; Nemeth, Margaret A; Song, Ping; Herman, Rod; Tagliani, Laura; Bannon, Gary A

    2006-03-01

    Typically, genetically engineered crops contain traits encoded by one or a few newly expressed proteins. The allergenicity assessment of newly expressed proteins is an important component in the safety evaluation of genetically engineered plants. One aspect of this assessment involves sequence searches that compare the amino acid sequence of the protein to all known allergens. Analyses are performed to determine the potential for immunologically based cross-reactivity where IgE directed against a known allergen could bind to the protein and elicit a clinical reaction in sensitized individuals. Bioinformatic searches are designed to detect global sequence similarity and short contiguous amino acid sequence identity. It has been suggested that potential allergen cross-reactivity may be predicted by identifying matches as short as six to eight contiguous amino acids between the protein of interest and a known allergen. A series of analyses were performed, and match probabilities were calculated for different size peptides to determine if there was a scientifically justified search window size that identified allergen sequence characteristics. Four probability modeling methods were tested: (1) a mock protein and a mock allergen database, (2) a mock protein and genuine allergen database, (3) a genuine allergen and genuine protein database, and (4) a genuine allergen and genuine protein database combined with a correction for repeating peptides. These analyses indicated that searches for short amino acid sequence matches of eight amino acids or fewer to identify proteins as potential cross-reactive allergens is a product of chance and adds little value to allergy assessments for newly expressed proteins.

  19. Evolving genetic code

    PubMed Central

    OHAMA, Takeshi; INAGAKI, Yuji; BESSHO, Yoshitaka; OSAWA, Syozo

    2008-01-01

    In 1985, we reported that a bacterium, Mycoplasma capricolum, used a deviant genetic code, namely UGA, a “universal” stop codon, was read as tryptophan. This finding, together with the deviant nuclear genetic codes in not a few organisms and a number of mitochondria, shows that the genetic code is not universal, and is in a state of evolution. To account for the changes in codon meanings, we proposed the codon capture theory stating that all the code changes are non-disruptive without accompanied changes of amino acid sequences of proteins. Supporting evidence for the theory is presented in this review. A possible evolutionary process from the ancient to the present-day genetic code is also discussed. PMID:18941287

  20. Enhanced Gene Expression Rather than Natural Polymorphism in Coding Sequence of the OsbZIP23 Determines Drought Tolerance and Yield Improvement in Rice Genotypes.

    PubMed

    Dey, Avishek; Samanta, Milan Kumar; Gayen, Srimonta; Sen, Soumitra K; Maiti, Mrinal K

    2016-01-01

    Drought is one of the major limiting factors for productivity of crops including rice (Oryza sativa L.). Understanding the role of allelic variations of key regulatory genes involved in stress-tolerance is essential for developing an effective strategy to combat drought. The bZIP transcription factors play a crucial role in abiotic-stress adaptation in plants via abscisic acid (ABA) signaling pathway. The present study aimed to search for allelic polymorphism in the OsbZIP23 gene across selected drought-tolerant and drought-sensitive rice genotypes, and to characterize the new allele through overexpression (OE) and gene-silencing (RNAi). Analyses of the coding DNA sequence (CDS) of the cloned OsbZIP23 gene revealed single nucleotide polymorphism at four places and a 15-nucleotide deletion at one place. The single-copy OsbZIP23 gene is expressed at relatively higher level in leaf tissues of drought-tolerant genotypes, and its abundance is more in reproductive stage. Cloning and sequence analyses of the OsbZIP23-promoter from drought-tolerant O. rufipogon and drought-sensitive IR20 cultivar showed variation in the number of stress-responsive cis-elements and a 35-nucleotide deletion at 5'-UTR in IR20. Analysis of the GFP reporter gene function revealed that the promoter activity of O. rufipogon is comparatively higher than that of IR20. The overexpression of any of the two polymorphic forms (1083 bp and 1068 bp CDS) of OsbZIP23 improved drought tolerance and yield-related traits significantly by retaining higher content of cellular water, soluble sugar and proline; and exhibited decrease in membrane lipid peroxidation in comparison to RNAi lines and non-transgenic plants. The OE lines showed higher expression of target genes-OsRab16B, OsRab21 and OsLEA3-1 and increased ABA sensitivity; indicating that OsbZIP23 is a positive transcriptional-regulator of the ABA-signaling pathway. Taken together, the present study concludes that the enhanced gene expression rather than

  1. Enhanced Gene Expression Rather than Natural Polymorphism in Coding Sequence of the OsbZIP23 Determines Drought Tolerance and Yield Improvement in Rice Genotypes

    PubMed Central

    Dey, Avishek; Samanta, Milan Kumar; Gayen, Srimonta; Sen, Soumitra K.; Maiti, Mrinal K.

    2016-01-01

    Drought is one of the major limiting factors for productivity of crops including rice (Oryza sativa L.). Understanding the role of allelic variations of key regulatory genes involved in stress-tolerance is essential for developing an effective strategy to combat drought. The bZIP transcription factors play a crucial role in abiotic-stress adaptation in plants via abscisic acid (ABA) signaling pathway. The present study aimed to search for allelic polymorphism in the OsbZIP23 gene across selected drought-tolerant and drought-sensitive rice genotypes, and to characterize the new allele through overexpression (OE) and gene-silencing (RNAi). Analyses of the coding DNA sequence (CDS) of the cloned OsbZIP23 gene revealed single nucleotide polymorphism at four places and a 15-nucleotide deletion at one place. The single-copy OsbZIP23 gene is expressed at relatively higher level in leaf tissues of drought-tolerant genotypes, and its abundance is more in reproductive stage. Cloning and sequence analyses of the OsbZIP23-promoter from drought-tolerant O. rufipogon and drought-sensitive IR20 cultivar showed variation in the number of stress-responsive cis-elements and a 35-nucleotide deletion at 5’-UTR in IR20. Analysis of the GFP reporter gene function revealed that the promoter activity of O. rufipogon is comparatively higher than that of IR20. The overexpression of any of the two polymorphic forms (1083 bp and 1068 bp CDS) of OsbZIP23 improved drought tolerance and yield-related traits significantly by retaining higher content of cellular water, soluble sugar and proline; and exhibited decrease in membrane lipid peroxidation in comparison to RNAi lines and non-transgenic plants. The OE lines showed higher expression of target genes-OsRab16B, OsRab21 and OsLEA3-1 and increased ABA sensitivity; indicating that OsbZIP23 is a positive transcriptional-regulator of the ABA-signaling pathway. Taken together, the present study concludes that the enhanced gene expression rather

  2. Enhanced Gene Expression Rather than Natural Polymorphism in Coding Sequence of the OsbZIP23 Determines Drought Tolerance and Yield Improvement in Rice Genotypes.

    PubMed

    Dey, Avishek; Samanta, Milan Kumar; Gayen, Srimonta; Sen, Soumitra K; Maiti, Mrinal K

    2016-01-01

    Drought is one of the major limiting factors for productivity of crops including rice (Oryza sativa L.). Understanding the role of allelic variations of key regulatory genes involved in stress-tolerance is essential for developing an effective strategy to combat drought. The bZIP transcription factors play a crucial role in abiotic-stress adaptation in plants via abscisic acid (ABA) signaling pathway. The present study aimed to search for allelic polymorphism in the OsbZIP23 gene across selected drought-tolerant and drought-sensitive rice genotypes, and to characterize the new allele through overexpression (OE) and gene-silencing (RNAi). Analyses of the coding DNA sequence (CDS) of the cloned OsbZIP23 gene revealed single nucleotide polymorphism at four places and a 15-nucleotide deletion at one place. The single-copy OsbZIP23 gene is expressed at relatively higher level in leaf tissues of drought-tolerant genotypes, and its abundance is more in reproductive stage. Cloning and sequence analyses of the OsbZIP23-promoter from drought-tolerant O. rufipogon and drought-sensitive IR20 cultivar showed variation in the number of stress-responsive cis-elements and a 35-nucleotide deletion at 5'-UTR in IR20. Analysis of the GFP reporter gene function revealed that the promoter activity of O. rufipogon is comparatively higher than that of IR20. The overexpression of any of the two polymorphic forms (1083 bp and 1068 bp CDS) of OsbZIP23 improved drought tolerance and yield-related traits significantly by retaining higher content of cellular water, soluble sugar and proline; and exhibited decrease in membrane lipid peroxidation in comparison to RNAi lines and non-transgenic plants. The OE lines showed higher expression of target genes-OsRab16B, OsRab21 and OsLEA3-1 and increased ABA sensitivity; indicating that OsbZIP23 is a positive transcriptional-regulator of the ABA-signaling pathway. Taken together, the present study concludes that the enhanced gene expression rather than

  3. Generic detection of poleroviruses using an RT-PCR assay targeting the RdRp coding sequence.

    PubMed

    Lotos, Leonidas; Efthimiou, Konstantinos; Maliogka, Varvara I; Katis, Nikolaos I

    2014-03-01

    In this study a two-step RT-PCR assay was developed for the generic detection of poleroviruses. The RdRp coding region was selected as the primers' target, since it differs significantly from that of other members in the family Luteoviridae and its sequence can be more informative than other regions in the viral genome. Species specific RT-PCR assays targeting the same region were also developed for the detection of the six most widespread poleroviral species (Beet mild yellowing virus, Beet western yellows virus, Cucurbit aphid-borne virus, Carrot red leaf virus, Potato leafroll virus and Turnip yellows virus) in Greece and the collection of isolates. These isolates along with other characterized ones were used for the evaluation of the generic PCR's detection range. The developed assay efficiently amplified a 593bp RdRp fragment from 46 isolates of 10 different Polerovirus species. Phylogenetic analysis using the generic PCR's amplicon sequence showed that although it cannot accurately infer evolutionary relationships within the genus it can differentiate poleroviruses at the species level. Overall, the described generic assay could be applied for the reliable detection of Polerovirus infections and, in combination with the specific PCRs, for the identification of new and uncharacterized species in the genus. PMID:24374125

  4. OrthoMaM v8: a database of orthologous exons and coding sequences for comparative genomics in mammals.

    PubMed

    Douzery, Emmanuel J P; Scornavacca, Celine; Romiguier, Jonathan; Belkhir, Khalid; Galtier, Nicolas; Delsuc, Frédéric; Ranwez, Vincent

    2014-07-01

    Comparative genomic studies extensively rely on alignments of orthologous sequences. Yet, selecting, gathering, and aligning orthologous exons and protein-coding sequences (CDS) that are relevant for a given evolutionary analysis can be a difficult and time-consuming task. In this context, we developed OrthoMaM, a database of ORTHOlogous MAmmalian Markers describing the evolutionary dynamics of orthologous genes in mammalian genomes using a phylogenetic framework. Since its first release in 2007, OrthoMaM has regularly evolved, not only to include newly available genomes but also to incorporate up-to-date software in its analytic pipeline. This eighth release integrates the 40 complete mammalian genomes available in Ensembl v73 and provides alignments, phylogenies, evolutionary descriptor information, and functional annotations for 13,404 single-copy orthologous CDS and 6,953 long exons. The graphical interface allows to easily explore OrthoMaM to identify markers with specific characteristics (e.g., taxa availability, alignment size, %G+C, evolutionary rate, chromosome location). It hence provides an efficient solution to sample preprocessed markers adapted to user-specific needs. OrthoMaM has proven to be a valuable resource for researchers interested in mammalian phylogenomics, evolutionary genomics, and has served as a source of benchmark empirical data sets in several methodological studies. OrthoMaM is available for browsing, query and complete or filtered downloads at http://www.orthomam.univ-montp2.fr/.

  5. Quantitative detection of Aspergillus spp. by real-time nucleic acid sequence-based amplification.

    PubMed

    Zhao, Yanan; Perlin, David S

    2013-01-01

    Rapid and quantitative detection of Aspergillus from clinical samples may facilitate an early diagnosis of invasive pulmonary aspergillosis (IPA). As nucleic acid-based detection is a viable option, we demonstrate that Aspergillus burdens can be rapidly and accurately detected by a novel real-time nucleic acid assay other than qPCR by using the combination of nucleic acid sequence-based amplification (NASBA) and the molecular beacon (MB) technology. Here, we detail a real-time NASBA assay to determine quantitative Aspergillus burdens in lungs and bronchoalveolar lavage (BAL) fluids of rats with experimental IPA.

  6. Draft Genome Sequence of the Butyric Acid Producer Clostridium tyrobutyricum Strain CIP I-776 (IFP923)

    PubMed Central

    Clément, Benjamin; Lopes Ferreira, Nicolas

    2016-01-01

    Here, we report the draft genome sequence of Clostridium tyrobutyricum CIP I-776 (IFP923), an efficient producer of butyric acid. The genome consists of a single chromosome of 3.19 Mb and provides useful data concerning the metabolic capacities of the strain. PMID:26941139

  7. Amino acid sequence of the encephalitogenic basic protein from human myelin

    PubMed Central

    Carnegie, P. R.

    1971-01-01

    Myelin from the central nervous system contains an unusual basic protein, which can induce experimental autoimmune encephalomyelitis. The basic protein from human brain was digested with trypsin and other enzymes and the sequence of the 170 amino acids was determined. The localization of the encephalitogenic determinants was described. Possible roles for the protein in the structure and function of myelin are discussed. PMID:4108501

  8. Sequence-specific formation of d-amino acids in a monoclonal antibody during light exposure.

    PubMed

    Mozziconacci, Olivier; Schöneich, Christian

    2014-11-01

    The photoirradiation of a monoclonal antibody 1 (mAb1) at λ = 254 nm and λmax = 305 nm resulted in the sequence-specific generation of d-Val, d-Tyr, and potentially d-Ala and d-Arg, in the heavy chain sequence [95-101] YCARVVY. d-Amino acid formation is most likely the product of reversible intermediary carbon-centered radical formation at the (α)C-positions of the respective amino acids ((α)C(•) radicals) through the action of Cys thiyl radicals (CysS(•)). The latter can be generated photochemically either through direct homolysis of cystine or through photoinduced electron transfer from Trp and/or Tyr residues. The potential of mAb1 sequences to undergo epimerization was first evaluated through covalent H/D exchange during photoirradiation in D2O, and proteolytic peptides exhibiting deuterium incorporation were monitored by HPLC-MS/MS analysis. Subsequently, mAb1 was photoirradiated in H2O, and peptides, for which deuterium incorporation in D2O had been documented, were purified by HPLC and subjected to hydrolysis and amino acid analysis. Importantly, not all peptide sequences which incorporated deuterium during photoirradiation in D2O also exhibited photoinduced d-amino acid formation. For example, the heavy chain sequence [12-18] VQPGGSL showed significant deuterium incorporation during photoirradiation in D2O, but no photoinduced formation of d-amino acids was detected. Instead this sequence contained ca. 22% d-Val in both a photoirradiated and a control sample. This observation could indicate that d-Val may have been generated either during production and/or storage or during sample preparation. While sample preparation did not lead to the formation of d-Val or other d-amino acids in the control sample for the heavy chain sequence [95-101] YCARVVY, we may have to consider that during hydrolysis N-terminal residues (such as in VQPGGSL) may be more prone to epimerization. We conclude that the photoinduced, radical-dependent formation of d-amino acids

  9. The complete amino acid sequence of chitinase-B from the leaves of pokeweed (Phytolacca americana).

    PubMed

    Tanigawa, M; Yamagami, T; Funatsu, G

    1995-05-01

    The complete amino acid sequence of pokeweed leaf chitinase-B (PLC-B) has been determined by first sequencing all 19 tryptic peptides derived from the reduced and S-carboxymethylated (RCm-) PLC-B and then connecting them by analyzing the chymotryptic peptides from three fragments produced by cyanogen bromide cleavage of RCm-PLC-B. PLC-B consists of 274 amino acid residues and has a molecular mass of 29,473 Da. Six cysteine residues are linked by disulfide bonds between Cys20 and Cys67, Cys50 and Cys57, and Cys159 and Cys188. From 58-68% sequence homology of PLC-B with five class III chitinases, it was concluded that PLC-B is a basic class III chitinase.

  10. Pyruvate decarboxylase from Pisum sativum. Properties, nucleotide and amino acid sequences.

    PubMed

    Mücke, U; Wohlfarth, T; Fiedler, U; Bäumlein, H; Rücknagel, K P; König, S

    1996-04-15

    To study the molecular structure and function of pyruvate decarboxylase (PDC) from plants the protein was isolated from pea seeds and partially characterised. The active enzyme which occurs in the form of higher oligomers consists of two different subunits appearing in SDS/PAGE and mass spectroscopy experiments. For further experiments, like X-ray crystallography, it was necessary to elucidate the protein sequence. Partial cDNA clones encoding pyruvate decarboxylase from seeds of Pisum sativum cv. Miko have been obtained by means of polymerase chain reaction techniques. The first sequences were found using degenerate oligonucleotide primers designated according to conserved amino acid sequences of known pyruvate decarboxylases. The missing parts of one cDNA were amplified applying the 3'- and 5'-rapid amplification of cDNA ends systems. The amino acid sequence deduced from the entire cDNA sequence displays strong similarity to pyruvate decarboxylases from other organisms, especially from plants. A molecular mass of 64 kDa was calculated for this protein correlating with estimations for the smaller subunit of the oligomeric enzyme. The PCR experiments led to at least three different clones representing the middle part of the PDC cDNA indicating the existence of three isozymes. Two of these isoforms could be confirmed on the protein level by sequencing tryptic peptides. Only anaerobically treated roots showed a positive signal for PDC mRNA in Northern analysis although the cDNA from imbibed seeds was successfully used for PCR.

  11. Toward a Catalog of Human Genes and Proteins: Sequencing and Analysis of 500 Novel Complete Protein Coding Human cDNAs

    PubMed Central

    Wiemann, Stefan; Weil, Bernd; Wellenreuther, Ruth; Gassenhuber, Johannes; Glassl, Sabine; Ansorge, Wilhelm; Böcher, Michael; Blöcker, Helmut; Bauersachs, Stefan; Blum, Helmut; Lauber, Jürgen; Düsterhöft, Andreas; Beyer, Andreas; Köhrer, Karl; Strack, Normann; Mewes, Hans-Werner; Ottenwälder, Birgit; Obermaier, Brigitte; Tampe, Jens; Heubner, Dagmar; Wambutt, Rolf; Korn, Bernhard; Klein, Michaela; Poustka, Annemarie

    2001-01-01

    With the complete human genomic sequence being unraveled, the focus will shift to gene identification and to the functional analysis of gene products. The generation of a set of cDNAs, both sequences and physical clones, which contains the complete and noninterrupted protein coding regions of all human genes will provide the indispensable tools for the systematic and comprehensive analysis of protein function to eventually understand the molecular basis of man. Here we report the sequencing and analysis of 500 novel human cDNAs containing the complete protein coding frame. Assignment to functional categories was possible for 52% (259) of the encoded proteins, the remaining fraction having no similarities with known proteins. By aligning the cDNA sequences with the sequences of the finished chromosomes 21 and 22 we identified a number of genes that either had been completely missed in the analysis of the genomic sequences or had been wrongly predicted. Three of these genes appear to be present in several copies. We conclude that full-length cDNA sequencing continues to be crucial also for the accurate identification of genes. The set of 500 novel cDNAs, and another 1000 full-coding cDNAs of known transcripts we have identified, adds up to cDNA representations covering 2%–5 % of all human genes. We thus substantially contribute to the generation of a gene catalog, consisting of both full-coding cDNA sequences and clones, which should be made freely available and will become an invaluable tool for detailed functional studies. [The sequence data described in this paper have been submitted to the EMBL database under the accession nos. given in Table 2.] PMID:11230166

  12. Allelic polymorphism in arabian camel ribonuclease and the amino acid sequence of bactrian camel ribonuclease.

    PubMed

    Welling, G W; Mulder, H; Beintema, J J

    1976-04-01

    Pancreatic ribonucleases from several species (whitetail deer, roe deer, guinea pig, and arabian camel) exhibit more than one amino acid at particular positions in their amino acid sequences. Since these enzymes were isolated from pooled pancreas, the origin of this heterogeneity is not clear. The pancreatic ribonucleases from 11 individual arabian camels (Camelus dromedarius) have been investigated with respect to the lysine-glutamine heterogeneity at position 103 (Welling et al., 1975). Six ribonucleases showed only one basic band and five showed two bands after polyacrylamide gel electrophoresis, suggesting a gene frequency of about 0.75 for the Lys gene and about 0.25 for the Gln gene. The amino acid sequence of bactrian camel (Camelus bactrianus) ribonuclease isolated from individual pancreatic tissue was determined and compared with that of arabian camel ribonuclease. The only difference was observed at position 103. In the ribonucleases from two unrelated bactrian camels, only glutamine was observed at that position. PMID:962846

  13. Nucleotide sequence of Crithidia fasciculata cytosol 5S ribosomal ribonucleic acid.

    PubMed

    MacKay, R M; Gray, M W; Doolittle, W F

    1980-11-11

    The complete nucleotide sequence of the cytosol 5S ribosomal ribonucleic acid of the trypanosomatid protozoan Crithidia fasciculata has been determined by a combination of T1-oligonucleotide catalog and gel sequencing techniques. The sequence is: GAGUACGACCAUACUUGAGUGAAAACACCAUAUCCCGUCCGAUUUGUGAAGUUAAGCACC CACAGGCUUAGUUAGUACUGAGGUCAGUGAUGACUCGGGAACCCUGAGUGCCGUACUCCCOH. This 5S ribosomal RNA is unique in having GAUU in place of the GAAC or GAUC found in all other prokaryotic and eukaryotic 5S RNAs, and thought to be involved in interactions with tRNAs. Comparisons to other eukaryotic cytosol 5S ribosomal RNA sequences indicate that the four major eukaryotic kingdoms (animals, plants, fungi, and protists) are about equally remote from each other, and that the latter kingdom may be the most internally diverse.

  14. Pattern recognition in nucleic acid sequences. II. An efficient method for finding locally stable secondary structures.

    PubMed Central

    Kanehisa, M I; Goad, W B

    1982-01-01

    We present a method for calculating all possible single hairpin loop secondary structures in a nucleic acid sequence by the order of N2 operations where N is the total number of bases. Each structure may contain any number of bulges and internal loops. Most natural sequences are found to be indistinguishable from random sequences in the potential of forming secondary structures, which is defined by the frequency of possible secondary structures calculated by the method. There is a strong correlation between the higher G+C content and the higher structure forming potential. Interestingly, the removal of intervening sequences in mRNAs is almost always accompanied by an increase in the G+C content, which may suggest an involvement of structural stabilization in the mRNA maturation. PMID:6174936

  15. Sequence based structural characterization and genetic diversity analysis across coding and promoter regions of goat Toll-like receptor 5 gene.

    PubMed

    Goyal, Shubham; Dubey, P K; Sahoo, B R; Mishra, S K; Niranjan, S K; Singh, Sanjeev; Mahajan, Ritu; Kataria, R S

    2014-05-01

    The exploration of candidate immune response genes in goat may be vital in improving further our understanding about the species specific response to pathogens specifically among the ruminants. In this study, approximately 3.7 kb long genomic sequence of Toll-like receptor 5 (TLR5) covering the entire coding and 5'upstream regions of the gene, was characterized in the Indian goat breeds. Sequence analysis revealed a 2577-nucleotide long open reading frame (ORF) of goat TLR5, encoding 858 amino acids from single exon, similar to other ruminants. The domain structure analysis of goat TLR5 showed the presence of 13 leucine rich repeats (LRRs) in extracellular domain (amino acid position 1-634), single transmembrane domain (position 644-666), and a Toll/interleukin-1 receptor (position 692-837) in cytoplasmic domain, similar to other species. A total of 87 putative transcription factor binding sites were observed within the 5' upstream region of TLR5 gene in goat, 106 in cattle, and 103 in buffalo. Sixteen polymorphic sites were observed in goat TLR5 gene, out of which 10 non-synonymous SNPs were in the functionally important regions. However, none of the amino acid substitutions was found to be potentially damaging to the structure and function of the receptor protein. Further, one of the SNPs in the transmembrane region was genotyped by a TETRA-ARMS PCR in 444 goats of nine breeds from different geographical regions and having different utilities. A significant variation in allelic frequencies was observed across the milch and other types of goat breeds. The comparative modeling of goat TLR5 followed by molecular dynamics simulation gave an insight into its 3D structural arrangements. The molecular docking of Salmonella flagellin and TLR5 dimer elucidated LRRNT (N-terminal) to LRR4 as the key flagellin binding domains region in goat TLR5. The study shows that, although being highly conserved among the ruminants, comparatively high variations in goat TLR5 might give

  16. Isolation and sequencing of a cDNA coding for the human DF3 breast carcinoma-associated antigen

    SciTech Connect

    Siddiqui, J.; Abe, M.; Hayes, D.; Shani, E.; Yunis, E.; Kufe, D. )

    1988-04-01

    The murine monoclonal antibody (mAb) DF3 reacts with a high molecular weight glycoprotein detectable in human breast carcinomas. DF3 antigen expression correlates with human breast tumor differentiation, and the detection of a cross-reactive species in human milk has suggested that this antigen might be useful as a marker of differentiated mammary epithelium. To further characterize DF3 antigen expression, the authors have isolated a cDNA clone from a {lambda}gt11 library by screening with mAb DF3. The results demonstrate that this 309-base-pair cDNA, designated pDF9.3, codes for the DF3 epitope. Southern blot analyses of EcoRI-digested DNAs from six human tumor cell lines with {sup 32}P-labeled pDF9.3 have revealed a restriction fragment length polymorphism. Variations in size of the alleles detected by pDF9.3 were also identified in Pst I, but not in HindIII, DNA digests. Furthermore, hybridization of {sup 32}P-labeled pDF9.3 with total cellular RNA from each of these cell lines demonstrated either one or two transcripts that varied from 4.1 to 7.1 kilobases in size. The presence of differently sized transcripts detected by pDF9.3 was also found to correspond with the polymorphic expression of DF3 glycoproteins. Nucleotide sequence analysis of pDF9.3 has revealed a highly conserved (G + C)-rich 60-base-pair tandem repeat. These findings suggest that the variation in size of alleles coding for the polymorphic DF3 glycoprotein may represent different numbers of repeats.

  17. Efficient Nucleic Acid Extraction and 16S rRNA Gene Sequencing for Bacterial Community Characterization.

    PubMed

    Anahtar, Melis N; Bowman, Brittany A; Kwon, Douglas S

    2016-01-01

    There is a growing appreciation for the role of microbial communities as critical modulators of human health and disease. High throughput sequencing technologies have allowed for the rapid and efficient characterization of bacterial communities using 16S rRNA gene sequencing from a variety of sources. Although readily available tools for 16S rRNA sequence analysis have standardized computational workflows, sample processing for DNA extraction remains a continued source of variability across studies. Here we describe an efficient, robust, and cost effective method for extracting nucleic acid from swabs. We also delineate downstream methods for 16S rRNA gene sequencing, including generation of sequencing libraries, data quality control, and sequence analysis. The workflow can accommodate multiple samples types, including stool and swabs collected from a variety of anatomical locations and host species. Additionally, recovered DNA and RNA can be separated and used for other applications, including whole genome sequencing or RNA-seq. The method described allows for a common processing approach for multiple sample types and accommodates downstream analysis of genomic, metagenomic and transcriptional information. PMID:27168460

  18. Efficient Nucleic Acid Extraction and 16S rRNA Gene Sequencing for Bacterial Community Characterization

    PubMed Central

    Anahtar, Melis N.; Bowman, Brittany A.; Kwon, Douglas S.

    2016-01-01

    There is a growing appreciation for the role of microbial communities as critical modulators of human health and disease. High throughput sequencing technologies have allowed for the rapid and efficient characterization of bacterial communities using 16S rRNA gene sequencing from a variety of sources. Although readily available tools for 16S rRNA sequence analysis have standardized computational workflows, sample processing for DNA extraction remains a continued source of variability across studies. Here we describe an efficient, robust, and cost effective method for extracting nucleic acid from swabs. We also delineate downstream methods for 16S rRNA gene sequencing, including generation of sequencing libraries, data quality control, and sequence analysis. The workflow can accommodate multiple samples types, including stool and swabs collected from a variety of anatomical locations and host species. Additionally, recovered DNA and RNA can be separated and used for other applications, including whole genome sequencing or RNA-seq. The method described allows for a common processing approach for multiple sample types and accommodates downstream analysis of genomic, metagenomic and transcriptional information. PMID:27168460

  19. Design of nucleic acid sequences for DNA computing based on a thermodynamic approach.

    PubMed

    Tanaka, Fumiaki; Kameda, Atsushi; Yamamoto, Masahito; Ohuchi, Azuma

    2005-01-01

    We have developed an algorithm for designing multiple sequences of nucleic acids that have a uniform melting temperature between the sequence and its complement and that do not hybridize non-specifically with each other based on the minimum free energy (DeltaG (min)). Sequences that satisfy these constraints can be utilized in computations, various engineering applications such as microarrays, and nano-fabrications. Our algorithm is a random generate-and-test algorithm: it generates a candidate sequence randomly and tests whether the sequence satisfies the constraints. The novelty of our algorithm is that the filtering method uses a greedy search to calculate DeltaG (min). This effectively excludes inappropriate sequences before DeltaG (min) is calculated, thereby reducing computation time drastically when compared with an algorithm without the filtering. Experimental results in silico showed the superiority of the greedy search over the traditional approach based on the hamming distance. In addition, experimental results in vitro demonstrated that the experimental free energy (DeltaG (exp)) of 126 sequences correlated well with DeltaG (min) (|R| = 0.90) than with the hamming distance (|R| = 0.80). These results validate the rationality of a thermodynamic approach. We implemented our algorithm in a graphic user interface-based program written in Java.

  20. Molecular evolutionary history of Sugarcane yellow leaf virus based on sequence analysis of RNA-dependent RNA polymerase and putative aphid transmission factor-coding genes.

    PubMed

    ElSayed, Abdelaleim Ismail; Boulila, Moncef; Rott, Philippe

    2014-06-01

    RNA-dependent RNA polymerase (RdRp) encoded by ORF2 and putative aphid transmission factor (PATF) encoded by ORF5 of Sugarcane yellow leaf virus (SCYLV) were detected in six sugarcane cultivars affected by yellow leaf using RT-PCR and real-time RT-PCR assays. Expression of both genes varied among infected plants, but overall expression of RdRp was higher than expression of PATF. Cultivar H87-4094 from Hawaii yielded the highest transcript levels of RdRp, whereas cultivar C1051-73 from Cuba exhibited the lowest levels. Sequence comparisons among 25 SCYLV isolates from various geographical locations revealed an amino acid similarity of 72.1-99.4 and 84.7-99.8 % for the RdRp and PATF genes, respectively. The 25 SCYLV isolates were separated into three (RdRp) and two (PATF) phylogenetic groups using the MEGA6 program that does not account for genetic recombination. However, the SCYLV genome contained potential recombination signals in the RdRp and PATF coding genes based on the GARD genetic algorithm. Use of this later program resulted in the reconstruction of phylogenies on the left as well as on the right sides of the putative recombination breaking points, and the 25 SCYLV isolates were distributed into three distinct phylogenetic groups based on either RdRp or PATF sequences. As a result, recombination reshuffled the affiliation of the accessions to the different clusters. Analysis of selection pressures exerted on RdRp and PATF encoded proteins revealed that ORF 2 and ORF 5 underwent predominantly purifying selection. However, a few sites were also under positive selection as assessed by various models such as FEL, IFEL, REL, FUBAR, MEME, GA-Branch, and PRIME. PMID:24952671

  1. Preparation of Nucleic Acid Libraries for Personalized Sequencing Systems Using an Integrated Microfluidic Hub Technology (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    ScienceCinema

    Patel, Kamlesh D [Ken; SNL,

    2016-07-12

    Kamlesh (Ken) Patel from Sandia National Laboratories (Livermore, California) presents "Preparation of Nucleic Acid Libraries for Personalized Sequencing Systems Using an Integrated Microfluidic Hub Technology " at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  2. Preparation of Nucleic Acid Libraries for Personalized Sequencing Systems Using an Integrated Microfluidic Hub Technology (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    SciTech Connect

    Patel, Kamlesh D; SNL,

    2012-06-01

    Kamlesh (Ken) Patel from Sandia National Laboratories (Livermore, California) presents "Preparation of Nucleic Acid Libraries for Personalized Sequencing Systems Using an Integrated Microfluidic Hub Technology " at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  3. Sequencing the GRHL3 Coding Region Reveals Rare Truncating Mutations and a Common Susceptibility Variant for Nonsyndromic Cleft Palate.

    PubMed

    Mangold, Elisabeth; Böhmer, Anne C; Ishorst, Nina; Hoebel, Ann-Kathrin; Gültepe, Pinar; Schuenke, Hannah; Klamt, Johanna; Hofmann, Andrea; Gölz, Lina; Raff, Ruth; Tessmann, Peter; Nowak, Stefanie; Reutter, Heiko; Hemprich, Alexander; Kreusch, Thomas; Kramer, Franz-Josef; Braumann, Bert; Reich, Rudolf; Schmidt, Gül; Jäger, Andreas; Reiter, Rudolf; Brosch, Sibylle; Stavusis, Janis; Ishida, Miho; Seselgyte, Rimante; Moore, Gudrun E; Nöthen, Markus M; Borck, Guntram; Aldhorae, Khalid A; Lace, Baiba; Stanier, Philip; Knapp, Michael; Ludwig, Kerstin U

    2016-04-01

    Nonsyndromic cleft lip with/without cleft palate (nsCL/P) and nonsyndromic cleft palate only (nsCPO) are the most frequent subphenotypes of orofacial clefts. A common syndromic form of orofacial clefting is Van der Woude syndrome (VWS) where individuals have CL/P or CPO, often but not always associated with lower lip pits. Recently, ∼5% of VWS-affected individuals were identified with mutations in the grainy head-like 3 gene (GRHL3). To investigate GRHL3 in nonsyndromic clefting, we sequenced its coding region in 576 Europeans with nsCL/P and 96 with nsCPO. Most strikingly, nsCPO-affected individuals had a higher minor allele frequency for rs41268753 (0.099) than control subjects (0.049; p = 1.24 × 10(-2)). This association was replicated in nsCPO/control cohorts from Latvia, Yemen, and the UK (pcombined = 2.63 × 10(-5); ORallelic = 2.46 [95% CI 1.6-3.7]) and reached genome-wide significance in combination with imputed data from a GWAS in nsCPO triads (p = 2.73 × 10(-9)). Notably, rs41268753 is not associated with nsCL/P (p = 0.45). rs41268753 encodes the highly conserved p.Thr454Met (c.1361C>T) (GERP = 5.3), which prediction programs denote as deleterious, has a CADD score of 29.6, and increases protein binding capacity in silico. Sequencing also revealed four novel truncating GRHL3 mutations including two that were de novo in four families, where all nine individuals harboring mutations had nsCPO. This is important for genetic counseling: given that VWS is rare compared to nsCPO, our data suggest that dominant GRHL3 mutations are more likely to cause nonsyndromic than syndromic CPO. Thus, with rare dominant mutations and a common risk variant in the coding region, we have identified an important contribution for GRHL3 in nsCPO. PMID:27018475

  4. Sequencing the GRHL3 Coding Region Reveals Rare Truncating Mutations and a Common Susceptibility Variant for Nonsyndromic Cleft Palate

    PubMed Central

    Mangold, Elisabeth; Böhmer, Anne C.; Ishorst, Nina; Hoebel, Ann-Kathrin; Gültepe, Pinar; Schuenke, Hannah; Klamt, Johanna; Hofmann, Andrea; Gölz, Lina; Raff, Ruth; Tessmann, Peter; Nowak, Stefanie; Reutter, Heiko; Hemprich, Alexander; Kreusch, Thomas; Kramer, Franz-Josef; Braumann, Bert; Reich, Rudolf; Schmidt, Gül; Jäger, Andreas; Reiter, Rudolf; Brosch, Sibylle; Stavusis, Janis; Ishida, Miho; Seselgyte, Rimante; Moore, Gudrun E.; Nöthen, Markus M.; Borck, Guntram; Aldhorae, Khalid A.; Lace, Baiba; Stanier, Philip; Knapp, Michael; Ludwig, Kerstin U.

    2016-01-01

    Nonsyndromic cleft lip with/without cleft palate (nsCL/P) and nonsyndromic cleft palate only (nsCPO) are the most frequent subphenotypes of orofacial clefts. A common syndromic form of orofacial clefting is Van der Woude syndrome (VWS) where individuals have CL/P or CPO, often but not always associated with lower lip pits. Recently, ∼5% of VWS-affected individuals were identified with mutations in the grainy head-like 3 gene (GRHL3). To investigate GRHL3 in nonsyndromic clefting, we sequenced its coding region in 576 Europeans with nsCL/P and 96 with nsCPO. Most strikingly, nsCPO-affected individuals had a higher minor allele frequency for rs41268753 (0.099) than control subjects (0.049; p = 1.24 × 10−2). This association was replicated in nsCPO/control cohorts from Latvia, Yemen, and the UK (pcombined = 2.63 × 10−5; ORallelic = 2.46 [95% CI 1.6–3.7]) and reached genome-wide significance in combination with imputed data from a GWAS in nsCPO triads (p = 2.73 × 10−9). Notably, rs41268753 is not associated with nsCL/P (p = 0.45). rs41268753 encodes the highly conserved p.Thr454Met (c.1361C>T) (GERP = 5.3), which prediction programs denote as deleterious, has a CADD score of 29.6, and increases protein binding capacity in silico. Sequencing also revealed four novel truncating GRHL3 mutations including two that were de novo in four families, where all nine individuals harboring mutations had nsCPO. This is important for genetic counseling: given that VWS is rare compared to nsCPO, our data suggest that dominant GRHL3 mutations are more likely to cause nonsyndromic than syndromic CPO. Thus, with rare dominant mutations and a common risk variant in the coding region, we have identified an important contribution for GRHL3 in nsCPO. PMID:27018475

  5. Studies on adenosine triphosphate transphosphorylases. Amino acid sequence of rabbit muscle ATP-AMP transphosphorylase.

    PubMed

    Kuby, S A; Palmieri, R H; Frischat, A; Fischer, A H; Wu, L H; Maland, L; Manship, M

    1984-05-22

    The total amino acid sequence of rabbit muscle adenylate kinase has been determined, and the single polypeptide chain of 194 amino acid residues starts with N-acetylmethionine and ends with leucyllysine at its carboxyl terminus, in agreement with the earlier data on its amino acid composition [Mahowald, T. A., Noltmann, E. A., & Kuby, S. A. (1962) J. Biol. Chem. 237, 1138-1145] and its carboxyl-terminus sequence [Olson, O. E., & Kuby, S. A. (1964) J. Biol. Chem. 239, 460-467]. Elucidation of the primary structure was based on tryptic and chymotryptic cleavages of the performic acid oxidized protein, cyanogen bromide cleavages of the 14C-labeled S-carboxymethylated protein at its five methionine sites (followed by maleylation of peptide fragments), and tryptic cleavages at its 12 arginine sites of the maleylated 14C-labeled S-carboxymethylated protein. Calf muscle myokinase, whose sequence has also been established, differs primarily from the rabbit muscle myokinase's sequence in the following: His-30 is replaced by Gln-30; Lys-56 is replaced by Met-56; Ala-84 and Asp 85 are replaced by Val-84 and Asn-85. A comparison of the four muscle-type adenylate kinases, whose covalent structures have now been determined, viz., rabbit, calf, porcine, and human [for the latter two sequences see Heil, A., Müller, G., Noda, L., Pinder, T., Schirmer, H., Schirmer, I., & Von Zabern, I. (1974) Eur. J. Biochem. 43, 131-144, and Von Zabern, I., Wittmann-Liebold, B., Untucht-Grau, R., Schirmer, R. H., & Pai, E. F. (1976) Eur. J. Biochem. 68, 281-290], demonstrates an extraordinary degree of homology.(ABSTRACT TRUNCATED AT 250 WORDS)

  6. Deduced amino acid sequence of human pulmonary surfactant proteolipid: SPL(pVal)

    SciTech Connect

    Whitsett, J.A.; Glasser, S.W.; Korfhagen, T.R.; Weaver, T.E.; Clark, J.; Pilot-Matias, T.; Meuth, J.; Fox, J.L.

    1987-05-01

    Hydrophobic, proteolipid-like protein of Mr 6500 was isolated from ether/ethanol extracts of human, canine and bovine pulmonary surfactant. Amino acid composition of the protein demonstrated a remarkable abundance of hydrophobic residues, particularly valine and leucine. The N-terminal amino acid sequence of the human protein was determined: N-Leu-Ile-Pro-Cys-Cys-Pro-Val-Asn-Leu-Lys-Arg-Leu-Leu-Ile-Val4... An oligonucleotide probe was used to screen an adult human lung cDNA library and resulted in detection of cDNA clones with predicted amino acid sequence with close identity to the N-terminal amino acid sequence of the human peptide. SPL(pVal) was found within the reading frame of a larger peptide. SPL(pVal) results from proteolytic processing of a larger preprotein. Northern blot analysis detected in a single 1.0 kilobase SPL(pVal) RNA which was less abundant in fetal than in adult lung. Mixtures of purified canine and bovine SPL(pVal) and synthetic phospholipids display properties of rapid adsorption and surface tension lowering activity characteristic of surfactant. Human SPL(pVal) is a pulmonary surfactant proteolipid which may therefore be useful in combination with phospholipids and/or other surfactant proteins for the treatment of surfactant deficiency such as hyaline membrane disease in newborn infants.

  7. Complete amino acid sequence of a human monocyte chemoattractant, a putative mediator of cellular immune reactions.

    PubMed Central

    Robinson, E A; Yoshimura, T; Leonard, E J; Tanaka, S; Griffin, P R; Shabanowitz, J; Hunt, D F; Appella, E

    1989-01-01

    In a study of the structural basis for leukocyte specificity of chemoattractants, we determined the complete amino acid sequence of human glioma-derived monocyte chemotactic factor (GDCF-2), a peptide that attracts human monocytes but not neutrophils. The choice of a tumor cell product for analysis was dictated by its relative abundance and an amino acid composition indistinguishable from that of lymphocyte-derived chemotactic factor (LDCF), the agonist thought to account for monocyte accumulation in cellular immune reactions. By a combination of Edman degradation and mass spectrometry, it was established that GDCF-2 comprises 76 amino acid residues, commencing at the N terminus with pyroglutamic acid. The peptide contains four half-cystines, at positions 11, 12, 36, and 52, which create a pair of loops, clustered at the disulfide bridges. The relative positions of the half-cystines are almost identical to those of monocyte-derived neutrophil chemotactic factor (MDNCF), a peptide of similar mass but with only 24% sequence identity to GDCF. Thus, GDCF and MDNCF have a similar gross secondary structure because of the loops formed by the clustered disulfides, and their different leukocyte specificities are most likely determined by the large differences in primary sequence. PMID:2648385

  8. Amino acid sequences of lower vertebrate parvalbumins and their evolution: parvalbumins of boa, turtle, and salamander.

    PubMed

    Maeda, N; Zhu, D X; Fitch, W M

    1984-11-01

    One major parvalbumin each was isolated from the skeletal muscle of two reptiles, a boa snake, Boa constrictor, and a map turtle, Graptemys geographica, while two parvalbumins were isolated from an amphibian, the salamander Amphiuma means. The amino acid sequences of all four parvalbumins were determined from the sequences of their tryptic peptides, which were ordered partially by homology to other parvalbumins. Phylogenetic study of these and 16 other parvalbumin sequences revealed that the turtle parvalbumin belongs to beta lineage, while the salamander sequences belong, one each, to the alpha and beta lineages defined by Goodman and Pechère (1977). Boa parvalbumin, however, while belonging to the beta lineage, clusters within the fish in all reasonably parsimonious trees. The most parsimonious trees show many parallel or back mutations in the evolution of many parvalbumin residues, although the residues responsible for Ca2+ binding are very well conserved. These most parsimonious trees show an actinopterygian rather than a crossoptyrigian origin of the tetrapods in both the alpha and beta groups. One of two electric eel parvalbumins is evolving more than 10 times faster than its paralogous partner, suggesting it may be on its way to becoming a pseudogene. It is concluded that varying rates of amino acid replacement, much homoplasy, considerable gene duplication, plus complicated lineages make the set of parvalbumin sequences unsuitable for systematic study of the origin of the tetrapods and other higher-taxa divergence, although it may be suitable within a genus or family.

  9. DNA Cloning of Plasmodium falciparum Circumsporozoite Gene: Amino Acid Sequence of Repetitive Epitope

    NASA Astrophysics Data System (ADS)

    Enea, Vincenzo; Ellis, Joan; Zavala, Fidel; Arnot, David E.; Asavanich, Achara; Masuda, Aoi; Quakyi, Isabella; Nussenzweig, Ruth S.

    1984-08-01

    A clone of complementary DNA encoding the circumsporozoite (CS) protein of the human malaria parasite Plasmodium falciparum has been isolated by screening an Escherichia coli complementary DNA library with a monoclonal antibody to the CS protein. The DNA sequence of the complementary DNA insert encodes a four-amino acid sequence: proline-asparagine-alanine-asparagine, tandemly repeated 23 times. The CS β -lactamase fusion protein specifically binds monoclonal antibodies to the CS protein and inhibits the binding of these antibodies to native Plasmodium falciparum CS protein. These findings provide a basis for the development of a vaccine against Plasmodium falciparum malaria.

  10. Nucleotide and amino acid sequences of human intestinal alkaline phosphatase: close homology to placental alkaline phosphatase

    SciTech Connect

    Henthorn, P.S.; Raducha, M.; Edwards, Y.H.; Weiss, M.J.; Slaughter, C.; Lafferty, M.A.; Harris, H.

    1987-03-01

    A cDNA clone for human adult intestinal alkaline phosphatase (ALP) (orthophosphoric-monoester phosphohydrolase (alkaline optimum); EC 3.1.3.1) was isolated from a lambdagt11 expression library. The cDNA insert of this clone is 2513 base pairs in length and contains an open reading frame that encodes a 528-amino acid polypeptide. This deduced polypeptide contains the first 40 amino acids of human intestinal ALP, as determined by direct protein sequencing. Intestinal ALP shows 86.5% amino acid identity to placental (type 1) ALP and 56.6% amino acid identity to liver/bone/kidney ALP. In the 3'-untranslated regions, intestinal and placental ALP cDNAs are 73.5% identical (excluding gaps). The evolution of this multigene enzyme family is discussed.

  11. Three-Dimensional Algebraic Models of the tRNA Code and 12 Graphs for Representing the Amino Acids.

    PubMed

    José, Marco V; Morgado, Eberto R; Guimarães, Romeu Cardoso; Zamudio, Gabriel S; de Farías, Sávio Torres; Bobadilla, Juan R; Sosa, Daniela

    2014-01-01

    Three-dimensional algebraic models, also called Genetic Hotels, are developed to represent the Standard Genetic Code, the Standard tRNA Code (S-tRNA-C), and the Human tRNA code (H-tRNA-C). New algebraic concepts are introduced to be able to describe these models, to wit, the generalization of the 2n-Klein Group and the concept of a subgroup coset with a tail. We found that the H-tRNA-C displayed broken symmetries in regard to the S-tRNA-C, which is highly symmetric. We also show that there are only 12 ways to represent each of the corresponding phenotypic graphs of amino acids. The averages of statistical centrality measures of the 12 graphs for each of the three codes are carried out and they are statistically compared. The phenotypic graphs of the S-tRNA-C display a common triangular prism of amino acids in 10 out of the 12 graphs, whilst the corresponding graphs for the H-tRNA-C display only two triangular prisms. The graphs exhibit disjoint clusters of amino acids when their polar requirement values are used. We contend that the S-tRNA-C is in a frozen-like state, whereas the H-tRNA-C may be in an evolving state. PMID:25370377

  12. Three-Dimensional Algebraic Models of the tRNA Code and 12 Graphs for Representing the Amino Acids

    PubMed Central

    José, Marco V.; Morgado, Eberto R.; Guimarães, Romeu Cardoso; Zamudio, Gabriel S.; de Farías, Sávio Torres; Bobadilla, Juan R.; Sosa, Daniela

    2014-01-01

    Three-dimensional algebraic models, also called Genetic Hotels, are developed to represent the Standard Genetic Code, the Standard tRNA Code (S-tRNA-C), and the Human tRNA code (H-tRNA-C). New algebraic concepts are introduced to be able to describe these models, to wit, the generalization of the 2n-Klein Group and the concept of a subgroup coset with a tail. We found that the H-tRNA-C displayed broken symmetries in regard to the S-tRNA-C, which is highly symmetric. We also show that there are only 12 ways to represent each of the corresponding phenotypic graphs of amino acids. The averages of statistical centrality measures of the 12 graphs for each of the three codes are carried out and they are statistically compared. The phenotypic graphs of the S-tRNA-C display a common triangular prism of amino acids in 10 out of the 12 graphs, whilst the corresponding graphs for the H-tRNA-C display only two triangular prisms. The graphs exhibit disjoint clusters of amino acids when their polar requirement values are used. We contend that the S-tRNA-C is in a frozen-like state, whereas the H-tRNA-C may be in an evolving state. PMID:25370377

  13. Three-Dimensional Algebraic Models of the tRNA Code and 12 Graphs for Representing the Amino Acids.

    PubMed

    José, Marco V; Morgado, Eberto R; Guimarães, Romeu Cardoso; Zamudio, Gabriel S; de Farías, Sávio Torres; Bobadilla, Juan R; Sosa, Daniela

    2014-08-11

    Three-dimensional algebraic models, also called Genetic Hotels, are developed to represent the Standard Genetic Code, the Standard tRNA Code (S-tRNA-C), and the Human tRNA code (H-tRNA-C). New algebraic concepts are introduced to be able to describe these models, to wit, the generalization of the 2n-Klein Group and the concept of a subgroup coset with a tail. We found that the H-tRNA-C displayed broken symmetries in regard to the S-tRNA-C, which is highly symmetric. We also show that there are only 12 ways to represent each of the corresponding phenotypic graphs of amino acids. The averages of statistical centrality measures of the 12 graphs for each of the three codes are carried out and they are statistically compared. The phenotypic graphs of the S-tRNA-C display a common triangular prism of amino acids in 10 out of the 12 graphs, whilst the corresponding graphs for the H-tRNA-C display only two triangular prisms. The graphs exhibit disjoint clusters of amino acids when their polar requirement values are used. We contend that the S-tRNA-C is in a frozen-like state, whereas the H-tRNA-C may be in an evolving state.

  14. Integrating the intrinsic conformational preferences of non-coded α-amino acids modified at the peptide bond into the NCAD database

    PubMed Central

    Revilla-López, Guillem; Rodríguez-Ropero, Francisco; Curcó, David; Torras, Juan; Calaza, M. Isabel; Zanuy, David; Jiménez, Ana I.; Cativiela, Carlos; Nussinov, Ruth; Alemán, Carlos

    2011-01-01

    Recently, we reported a database (NCAD, Non-Coded Amino acids Database; http://recerca.upc.edu/imem/index.htm) that was built to compile information about the intrinsic conformational preferences of non-proteinogenic residues determined by quantum mechanical calculations, as well as bibliographic information about their synthesis, physical and spectroscopic characterization, the experimentally-established conformational propensities, and applications (J. Phys. Chem. B 2010, 114, 7413). The database initially contained the information available for α-tetrasubstituted α-amino acids. In this work, we extend NCAD to three families of compounds, which can be used to engineer peptides and proteins incorporating modifications at the –NHCO– peptide bond. Such families are: N-substituted α-amino acids, thio-α-amino acids, and diamines and diacids used to build retropeptides. The conformational preferences of these compounds have been analyzed and described based on the information captured in the database. In addition, we provide an example of the utility of the database and of the compounds it compiles in protein and peptide engineering. Specifically, the symmetry of a sequence engineered to stabilize the 310-helix with respect to the α-helix has been broken without perturbing significantly the secondary structure through targeted replacements using the information contained in the database. PMID:21491493

  15. Method for high-volume sequencing of nucleic acids: random and directed priming with libraries of oligonucleotides

    DOEpatents

    Studier, F.W.

    1995-04-18

    Random and directed priming methods for determining nucleotide sequences by enzymatic sequencing techniques, using libraries of primers of lengths 8, 9 or 10 bases, are disclosed. These methods permit direct sequencing of nucleic acids as large as 45,000 base pairs or larger without the necessity for subcloning. Individual primers are used repeatedly to prime sequence reactions in many different nucleic acid molecules. Libraries containing as few as 10,000 octamers, 14,200 nonamers, or 44,000 decamers would have the capacity to determine the sequence of almost any cosmid DNA. Random priming with a fixed set of primers from a smaller library can also be used to initiate the sequencing of individual nucleic acid molecules, with the sequence being completed by directed priming with primers from the library. In contrast to random cloning techniques, a combined random and directed priming strategy is far more efficient. 2 figs.

  16. Method for high-volume sequencing of nucleic acids: random and directed priming with libraries of oligonucleotides

    DOEpatents

    Studier, F. William

    1995-04-18

    Random and directed priming methods for determining nucleotide sequences by enzymatic sequencing techniques, using libraries of primers of lengths 8, 9 or 10 bases, are disclosed. These methods permit direct sequencing of nucleic acids as large as 45,000 base pairs or larger without the necessity for subcloning. Individual primers are used repeatedly to prime sequence reactions in many different nucleic acid molecules. Libraries containing as few as 10,000 octamers, 14,200 nonamers, or 44,000 decamers would have the capacity to determine the sequence of almost any cosmid DNA. Random priming with a fixed set of primers from a smaller library can also be used to initiate the sequencing of individual nucleic acid molecules, with the sequence being completed by directed priming with primers from the library. In contrast to random cloning techniques, a combined random and directed priming strategy is far more efficient.

  17. Full-length coding sequence for 12 bovine viral diarrhea virus isolates from persistently infected cattle in a feedyard in Kansas

    Technology Transfer Automated Retrieval System (TEKTRAN)

    We report here the full-length coding sequence of 12 bovine viral diarrhea virus (BVDV) isolates from persistently infected cattle from a feedyard in southwest Kansas, USA. These 12 genomes represent the three major genotypes (BVDV 1a, 1b, and 2a) of BVDV currently circulating in the United States....

  18. Association of low-frequency and rare coding-sequence variants with blood lipids and Coronary Heart Disease in 56,000 whites and blacks

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Low-frequency coding DNA sequence variants in the proprotein convertase subtilisin/kexin type 9 gene (PCSK9) lower plasma low-density lipoprotein cholesterol (LDL-C), protect against risk of coronary heart disease (CHD), and have prompted the development of a new class of therapeutics. It is uncerta...

  19. Reaction sequences in simulated neutralized current acid waste slurry during processing with formic acid

    SciTech Connect

    Smith, H.D.; Wiemers, K.D.; Langowski, M.H.; Powell, M.R.; Larson, D.E.

    1993-11-01

    The Hanford Waste Vitrification Plant (HWVP) is being designed for the Department of Energy to immobilize high-level and transuranic wastes as glass for permanent disposal. Pacific Northwest Laboratory is supporting the HWVP design activities by conducting laboratory-scale studies using a HWVP simulated waste slurry. Conditions which affect the slurry processing chemistry were evaluated in terms of offgas composition and peak generation rate and changes in slurry composition. A standard offgas profile defined in terms of three reaction phases, decomposition of H{sub 2}CO{sub 3}, destruction of NO{sub 2}{sup {minus}}, and production of H{sub 2} and NH{sub 3} was used as a baseline against which changes were evaluated. The test variables include nitrite concentration, acid neutralization capacity, temperature, and formic acid addition rate. Results to date indicate that pH is an important parameter influencing the N{sub 2}O/NO{sub x} generation ratio; nitrite can both inhibit and activate rhodium as a catalyst for formic acid decomposition to CO{sub 2} and H{sub 2}; and a separate reduced metal phase forms in the reducing environment. These data are being compiled to provide a basis for predicting the HWVP feed processing chemistry as a function of feed composition and operation variables, recommending criteria for chemical adjustments, and providing guidelines with respect to important control parameters to consider during routine and upset plant operation.

  20. Complete mitogenome sequences of four flatfishes (Pleuronectiformes) reveal a novel gene arrangement of L-strand coding genes

    PubMed Central

    2013-01-01

    Background Few mitochondrial gene rearrangements are found in vertebrates and large-scale changes in these genomes occur even less frequently. It is difficult, therefore, to propose a mechanism to account for observed changes in mitogenome structure. Mitochondrial gene rearrangements are usually explained by the recombination model or tandem duplication and random loss model. Results In this study, the complete mitochondrial genomes of four flatfishes, Crossorhombus azureus (blue flounder), Grammatobothus krempfi, Pleuronichthys cornutus, and Platichthys stellatus were determined. A striking finding is that eight genes in the C. azureus mitogenome are located in a novel position, differing from that of available vertebrate mitogenomes. Specifically, the ND6 and seven tRNA genes (the Q, A, C, Y, S1, E, P genes) encoded by the L-strand have been translocated to a position between tRNA-T and tRNA-F though the original order of the genes is maintained. Conclusions These special features are used to suggest a mechanism for C. azureus mitogenome rearrangement. First, a dimeric molecule was formed by two monomers linked head-to-tail, then one of the two sets of promoters lost function and the genes controlled by the disabled promoters became pseudogenes, non-coding sequences, and even were lost from the genome. This study provides a new gene-rearrangement model that accounts for the events of gene-rearrangement in a vertebrate mitogenome. PMID:23962312

  1. The complete amino acid sequence of lectin-C from the roots of pokeweed (Phytolacca americana).

    PubMed

    Yamaguchi, K; Mori, A; Funatsu, G

    1995-07-01

    The complete amino acid sequence of pokeweed lectin-C (PL-C) consisting of 126 residues has been determined. PL-C is an acidic simple protein with molecular mass of 13,747 Da and consists of three cysteine-rich domains with 51-63% homology. PL-C shows homology to chitin-binding proteins such as wheat germ agglutinin, and all eight cysteine residues in the three domains of PL-C are completely conserved in all other chitin-binding domains.

  2. Amino-acid sequence of a cooperative, dimeric myoglobin from the gastropod mollusc, Buccinum undatum L.

    PubMed

    Wen, D; Laursen, R A

    1994-10-19

    The complete amino-acid sequence of a dimeric myoglobin from the radular mussel of the gastropod mollusc, Buccinum undatum L. has been determined. The globin, which shows cooperative binding of oxygen, contains 146 amino acids, is N-terminal aminoacetylated, and has histidine residues at position 65 and 97, corresponding to the heme-binding histidines seen in mammalian myoglobins. It shows about 75% and 50% homology, respectively, with the dimeric molluscan myoglobins from Busycon canaliculatum and Cerithidea rhizophorarum, the former of which also shows weak cooperatively, but much less similarity to other species of myoglobin and hemoglobin.

  3. DNA as a Binary Code: How the Physical Structure of Nucleotide Bases Carries Information

    ERIC Educational Resources Information Center

    McCallister, Gary

    2005-01-01

    The DNA triplet code also functions as a binary code. Because double-ring compounds cannot bind to double-ring compounds in the DNA code, the sequence of bases classified simply as purines or pyrimidines can encode for smaller groups of possible amino acids. This is an intuitive approach to teaching the DNA code. (Contains 6 figures.)

  4. Amino acid sequence of human cholinesterase. Annual report, 30 September 1984-30 September 1985

    SciTech Connect

    Lockridge, O.

    1985-10-01

    The active-site serine residue is located 198 amino acids from the N-terminal. The active-site peptide was isolated from three different genetic types of human serum cholinesterase: from usual, atypical, and atypical-silent genotypes. It was found that the amino acid sequence of the active-site peptide was identical in all three genotypes. Comparison of the complete sequences of cholinesterase from human serum and acetylcholinesterase from the electric organ of Torpedo californica shows an identity of 53%. Cholinesterase is of interest to the Department of Defense because cholinesterase protects against organophosphate poisons of the type used in chemical warfare. The structural results presented here will serve as the basis for cloning the gene for cholinesterase. The potential uses of large amounts of cholinesterase would be for cleaning up spills of organophosphates and possibly for detoxifying exposed personnel.

  5. Amino acid sequence differences in pancreatic ribonucleases from water buffalo breeds from Indonesia and Italy.

    PubMed

    Sidik, A; Martena, B; Beintema, J J

    1979-12-01

    The amino acid sequences of the pancreatic ribonucleases from river-breed water buffaloes from Italy and swamp-breed water buffaloes from Indonesia differ at three positions. One of the differences involves a replacement of asparagine-34, with covalently attached carbohydrate on all molecules, in the river-breed enzyme by serine in the swamp-breed enzyme. The ribonuclease content of the pancreas differs considerably between breeds and is lower in river buffaloes. A ribonuclease preparation from two swamp buffaloes contained a minor glycosylated component. Preliminary evidence was obtained that the amino acid sequence of this component has factors in common with the main component of the swamp-breed ribonuclease and with the river-breed enzyme.

  6. Stereochemical Sequence Ion Selectivity: Proline versus Pipecolic-acid-containing Protonated Peptides

    NASA Astrophysics Data System (ADS)

    Abutokaikah, Maha T.; Guan, Shanshan; Bythell, Benjamin J.

    2016-10-01

    Substitution of proline by pipecolic acid, the six-membered ring congener of proline, results in vastly different tandem mass spectra. The well-known proline effect is eliminated and amide bond cleavage C-terminal to pipecolic acid dominates instead. Why do these two ostensibly similar residues produce dramatically differing spectra? Recent evidence indicates that the proton affinities of these residues are similar, so are unlikely to explain the result [Raulfs et al., J. Am. Soc. Mass Spectrom. 25, 1705-1715 (2014)]. An additional hypothesis based on increased flexibility was also advocated. Here, we provide a computational investigation of the "pipecolic acid effect," to test this and other hypotheses to determine if theory can shed additional light on this fascinating result. Our calculations provide evidence for both the increased flexibility of pipecolic-acid-containing peptides, and structural changes in the transition structures necessary to produce the sequence ions. The most striking computational finding is inversion of the stereochemistry of the transition structures leading to "proline effect"-type amide bond fragmentation between the proline/pipecolic acid-congeners: R (proline) to S (pipecolic acid). Additionally, our calculations predict substantial stabilization of the amide bond cleavage barriers for the pipecolic acid congeners by reduction in deleterious steric interactions and provide evidence for the importance of experimental energy regime in rationalizing the spectra.

  7. On human disease-causing amino acid variants: statistical study of sequence and structural patterns

    PubMed Central

    Alexov, Emil

    2015-01-01

    Statistical analysis was carried out on large set of naturally occurring human amino acid variations and it was demonstrated that there is a preference for some amino acid substitutions to be associated with diseases. At an amino acid sequence level, it was shown that the disease-causing variants frequently involve drastic changes of amino acid physico-chemical properties of proteins such as charge, hydrophobicity and geometry. Structural analysis of variants involved in diseases and being frequently observed in human population showed similar trends: disease-causing variants tend to cause more changes of hydrogen bond network and salt bridges as compared with harmless amino acid mutations. Analysis of thermodynamics data reported in literature, both experimental and computational, indicated that disease-causing variants tend to destabilize proteins and their interactions, which prompted us to investigate the effects of amino acid mutations on large databases of experimentally measured energy changes in unrelated proteins. Although the experimental datasets were linked neither to diseases nor exclusory to human proteins, the observed trends were the same: amino acid mutations tend to destabilize proteins and their interactions. Having in mind that structural and thermodynamics properties are interrelated, it is pointed out that any large change of any of them is anticipated to cause a disease. PMID:25689729

  8. Self-sequencing of amino acids and origins of polyfunctional protocells

    NASA Technical Reports Server (NTRS)

    Fox, S. W.

    1984-01-01

    The role of proteins in the origin of living things is discussed. It has been experimentally established that amino acids can sequence themselves under simulated geological conditions with highly nonrandom products which accordingly contain diverse information. Multiple copies of each type of macromolecule are formed, resulting in greater power for any protoenzymic molecule than would accrue from a single copy of each type. Thermal proteins are readily incorporated into laboratory protocells. The experimental evidence for original polyfunctional protocells is discussed.

  9. Comparisons of the Distribution of Nucleotides and Common Sequences in Deoxyribonucleic Acid from Selected Bacteriophages

    PubMed Central

    Skalka, A.; Hanson, P.

    1972-01-01

    Results from comparisons of deoxyribonucleic acid (DNA) from several classes of bacteriophages suggest that most phage chromosomes contain either a homogeneous distribution of nucleotides or are made up of a few, rather large segments of different quanine plus cytosine (G + C) contents which are internally homogeneous. Among those temperate phages tested, most contained segmented DNA. Comparisons of sequence similarities among segments from lambdoid phage DNA species revealed the following order in relatedness to λ: 82 (and 434) > 21 > 424 > φ80. Most common sequences are found in the highest G + C segments, which in λ contain head and tail genes. Hybridization tests with λ and 186 or P2 DNA species verified that the lambdoids and 186 and P2 belong to two distinct groups. There are fewer homologous sequences between the DNA species of coliphages λ and P2 or 186 than there are between the DNA species of coliphage λ and salmonella phage P22. PMID:4553679

  10. Structure of the fully modified left-handed cyclohexene nucleic acid sequence GTGTACAC.

    PubMed

    Robeyns, Koen; Herdewijn, Piet; Van Meervelt, Luc

    2008-02-13

    CeNA oligonucleotides consist of a phosphorylated backbone where the deoxyribose sugars are replaced by cyclohexene moieties. The X-ray structure determination and analysis of a fully modified octamer sequence GTGTACAC, which is the first crystal structure of a carbocyclic-based nucleic acid, is presented. This particular sequence was built with left-handed building blocks and crystallizes as a left-handed double helix. The helix can be characterized as belonging to the (mirrored) A-type family. Crystallographic data were processed up to 1.53 A, and the octamer sequence crystallizes in the space group R32. The sugar puckering is found to adopt the 3H2 half-chair conformation which mimics the C3'-endo conformation of the ribose sugar. The double helices stack on top of each other to form continuous helices, and static disorder is observed due to this end-to-end stacking.

  11. Amino acid sequence of a protease inhibitor isolated from Sarcophaga bullata determined by mass spectrometry.

    PubMed

    Papayannopoulos, I A; Biemann, K

    1992-02-01

    The amino acid sequence of a protease inhibitor isolated from the hemolymph of Sarcophaga bullata larvae was determined by tandem mass spectrometry. Homology considerations with respect to other protease inhibitors with known primary structures assisted in the choice of the procedure followed in the sequence determination and in the alignment of the various peptides obtained from specific chemical cleavage at cysteines and enzyme digests of the S. bullata protease inhibitor. The resulting sequence of 57 residues is as follows: Val Asp Lys Ser Ala Cys Leu Gln Pro Lys Glu Val Gly Pro Cys Arg Lys Ser Asp Phe Val Phe Phe Tyr Asn Ala Asp Thr Lys Ala Cys Glu Glu Phe Leu Tyr Gly Gly Cys Arg Gly Asn Asp Asn Arg Phe Asn Thr Lys Glu Glu Cys Glu Lys Leu Cys Leu.

  12. Some properties and amino acid sequence of plastocyanin from a green alga, Ulva arasakii.

    PubMed

    Yoshizaki, F; Fukazawa, T; Mishina, Y; Sugimura, Y

    1989-08-01

    Plastocyanin was purified from a multicellular, marine green alga, Ulva arasakii, by conventional methods to homogeneity. The oxidized plastocyanin showed absorption maxima at 252, 276.8, 460, 595.3, and 775 nm, and shoulders at 259, 265, 269, and 282.5 nm; the ratio A276.8/A595.3 was 1.5. The midpoint redox potential was determined to be 0.356 V at pH 7.0 with a ferri- and ferrocyanide system. The molecular weight was estimated to be 10,200 and 11,000 by SDS-PAGE and by gel filtration, respectively. U. arasakii also has a small amount of cytochrome c6, like Enteromorpha prolifera. The amino acid sequence of U. arasakii plastocyanin was determined by Edman degradation and by carboxypeptidase digestion of the plastocyanin, six tryptic peptides, and five staphylococcal protease peptides. The plastocyanin contained 98 amino acid residues, giving a molecular weight of 10,236 including one copper atom. The complete sequence is as follows: AQIVKLGGDDGALAFVPSKISVAAGEAIEFVNNAGFPHNIVFDEDAVPAGVDADAISYDDYLNSKGETV VRKLSTPGVY G VYCEPHAGAGMKMTITVQ. The sequence of U. arasakii plastocyanin is closet to that of the E. prolifera protein (85% homology). A phylogenetic tree of five algal and two higher plant plastocyanins was constructed by comparing the amino acid differences. The branching order is considered to be as follows: a blue-green alga, unicellular green algae, multicellular green algae, and higher plants. PMID:2509442

  13. Characterization of the microbial acid mine drainage microbial community using culturing and direct sequencing techniques.

    PubMed

    Auld, Ryan R; Myre, Maxine; Mykytczuk, Nadia C S; Leduc, Leo G; Merritt, Thomas J S

    2013-05-01

    We characterized the bacterial community from an AMD tailings pond using both classical culturing and modern direct sequencing techniques and compared the two methods. Acid mine drainage (AMD) is produced by the environmental and microbial oxidation of minerals dissolved from mining waste. Surprisingly, we know little about the microbial communities associated with AMD, despite the fundamental ecological roles of these organisms and large-scale economic impact of these waste sites. AMD microbial communities have classically been characterized by laboratory culturing-based techniques and more recently by direct sequencing of marker gene sequences, primarily the 16S rRNA gene. In our comparison of the techniques, we find that their results are complementary, overall indicating very similar community structure with similar dominant species, but with each method identifying some species that were missed by the other. We were able to culture the majority of species that our direct sequencing results indicated were present, primarily species within the Acidithiobacillus and Acidiphilium genera, although estimates of relative species abundance were only obtained from direct sequencing. Interestingly, our culture-based methods recovered four species that had been overlooked from our sequencing results because of the rarity of the marker gene sequences, likely members of the rare biosphere. Further, direct sequencing indicated that a single genus, completely missed in our culture-based study, Legionella, was a dominant member of the microbial community. Our results suggest that while either method does a reasonable job of identifying the dominant members of the AMD microbial community, together the methods combine to give a more complete picture of the true diversity of this environment. PMID:23485423

  14. Complete amino acid sequence of chitinase-A from leaves of pokeweed (Phytolacca americana).

    PubMed

    Yamagami, T; Tanigawa, M; Ishiguro, M; Funatsu, G

    1998-04-01

    The complete amino acid sequence of pokeweed leaf chitinase-A was determined. First all 11 tryptic peptides from the reduced and S-carboxymethylated form of the enzyme were sequenced. Then the same form of the enzyme was cleaved with cyanogen bromide, giving three fragments. The fragments were digested with chymotrypsin or Staphylococcus aureus V8 protease. Last, the 11 tryptic peptides were put in order. Of seven cysteine residues, six were linked by disulfide bonds (between Cys25 and Cys74, Cys89 and Cys98, and Cys195 and Cys208); Cys176 was free. The enzyme consisted of 208 amino acid residues and had a molecular weight of 22,391. It consisted of only one polypeptide chain without a chitin-binding domain. The length of the chain was almost the same as that of the catalytic domains of class IL chitinases. These findings suggested that this enzyme is a new kind of class IIL chitinase, although its sequence resembles that of catalytic domains of class IL chitinases more than that of the class IIL chitinases reported so far. Discussion on the involvement of specific tryptophan residue in the active site of PLC-A is also given based on the sequence similarity with rye seed chitinase-c.

  15. Metazoan remaining genes for essential amino acid biosynthesis: sequence conservation and evolutionary analyses.

    PubMed

    Costa, Igor R; Thompson, Julie D; Ortega, José Miguel; Prosdocimi, Francisco

    2014-12-24

    Essential amino acids (EAA) consist of a group of nine amino acids that animals are unable to synthesize via de novo pathways. Recently, it has been found that most metazoans lack the same set of enzymes responsible for the de novo EAA biosynthesis. Here we investigate the sequence conservation and evolution of all the metazoan remaining genes for EAA pathways. Initially, the set of all 49 enzymes responsible for the EAA de novo biosynthesis in yeast was retrieved. These enzymes were used as BLAST queries to search for similar sequences in a database containing 10 complete metazoan genomes. Eight enzymes typically attributed to EAA pathways were found to be ubiquitous in metazoan genomes, suggesting a conserved functional role. In this study, we address the question of how these genes evolved after losing their pathway partners. To do this, we compared metazoan genes with their fungal and plant orthologs. Using phylogenetic analysis with maximum likelihood, we found that acetolactate synthase (ALS) and betaine-homocysteine S-methyltransferase (BHMT) diverged from the expected Tree of Life (ToL) relationships. High sequence conservation in the paraphyletic group Plant-Fungi was identified for these two genes using a newly developed Python algorithm. Selective pressure analysis of ALS and BHMT protein sequences showed higher non-synonymous mutation ratios in comparisons between metazoans/fungi and metazoans/plants, supporting the hypothesis that these two genes have undergone non-ToL evolution in animals.

  16. The amino acid sequence of the aspartate aminotransferase from baker's yeast (Saccharomyces cerevisiae).

    PubMed Central

    Cronin, V B; Maras, B; Barra, D; Doonan, S

    1991-01-01

    1. The single (cytosolic) aspartate aminotransferase was purified in high yield from baker's yeast (Saccharomyces cerevisiae). 2. Amino-acid-sequence analysis was carried out by digestion of the protein with trypsin and with CNBr; some of the peptides produced were further subdigested with Staphylococcus aureus V8 proteinase or with pepsin. Peptides were sequenced by the dansyl-Edman method and/or by automated gas-phase methods. The amino acid sequence obtained was complete except for a probable gap of two residues as indicated by comparison with the structures of counterpart proteins in other species. 3. The N-terminus of the enzyme is blocked. Fast-atom-bombardment m.s. was used to identify the blocking group as an acetyl one. 4. Alignment of the sequence of the enzyme with those of vertebrate cytosolic and mitochondrial aspartate aminotransferases and with the enzyme from Escherichia coli showed that about 25% of residues are conserved between these distantly related forms. 5. Experimental details and confirmatory data for the results presented here are given in a Supplementary Publication (SUP 50164, 25 pages) that has been deposited at the British Library Document Supply Centre, Boston Spa. Wetherby, West Yorkshire LS23 7 BQ, U.K., from whom copies can be obtained on the terms indicated in Biochem. J. (1991) 273, 5. PMID:1859361

  17. [MOLECULAR EVOLUTION OF ION CHANNELS: AMINO ACID SEQUENCES AND 3D STRUCTURES].

    PubMed

    Korkosh, V S; Zhorov, B S; Tikhonov, D B

    2016-01-01

    An integral part of modern evolutionary biology is comparative analysis of structure and function of macromolecules such as proteins. The first and critical step to understand evolution of homologous proteins is their amino acid sequence alignment. However, standard algorithms fop not provide unambiguous sequence alignments for proteins of poor homology. More reliable results can be obtained by comparing experimental 3D structures obtained at atomic resolution, for instance, with the aid of X-ray structural analysis. If such structures are lacking, homology modeling is used, which may take into account indirect experimental data on functional roles of individual amino-acid residues. An important problem is that the sequence alignment, which reflects genetic modifications, does not necessarily correspond to the functional homology. The latter depends on three-dimensional structures which are critical for natural selection. Since alignment techniques relying only on the analysis of primary structures carry no information on the functional properties of proteins, including 3D structures into consideration is very important. Here we consider several examples involving ion channels and demonstrate that alignment of their three-dimensional structures can significantly improve sequence alignments obtained by traditional methods.

  18. Immunoreactivity of polyclonal antibodies generated against the carboxy terminus of the predicted amino acid sequence of the Huntington disease gene

    SciTech Connect

    Alkatib, G.; Graham, R.; Pelmear-Telenius, A.

    1994-09-01

    A cDNA fragment spanning the 3{prime}-end of the Huntington disease gene (from 8052 to 9252) was cloned into a prokaryotic expression vector containing the E. Coli lac promoter and a portion of the coding sequence for {beta}-galactosidase. The truncated {beta}-galactosidase gene was cleaved with BamHl and fused in frame to the BamHl fragment of the Huntington disease gene 3{prime}-end. Expression analysis of proteins made in E. Coli revealed that 20-30% of the total cellular proteins was represented by the {beta}-galactosidase-huntingtin fusion protein. The identity of the Huntington disease protein amino acid sequences was confirmed by protein sequence analysis. Affinity chromatography was used to purify large quantities of the fusion protein from bacterial cell lysates. Affinity-purified proteins were used to immunize New Zealand white rabbits for antibody production. The generated polyclonal antibodies were used to immunoprecipitate the Huntington disease gene product expressed in a neuroblastoma cell line. In this cell line the antibodies precipitated two protein bands of apparent gel migrations of 200 and 150 kd which together, correspond to the calculated molecular weight of the Huntington disease gene product (350 kd). Immunoblotting experiments revealed the presence of a large precursor protein in the range of 350-750 kd which is in agreement with the predicted molecular weight of the protein without post-translational modifications. These results indicate that the huntingtin protein is cleaved into two subunits in this neuroblastoma cell line and implicate that cleavage of a large precursor protein may contribute to its biological activity. Experiments are ongoing to determine the precursor-product relationship and to examine the synthesis of the huntingtin protein in freshly isolated rat brains, and to determine cellular and subcellular distribution of the gene product.

  19. Analysis of a nucleotide-binding site of 5-lipoxygenase by affinity labelling: binding characteristics and amino acid sequences.

    PubMed Central

    Zhang, Y Y; Hammarberg, T; Radmark, O; Samuelsson, B; Ng, C F; Funk, C D; Loscalzo, J

    2000-01-01

    5-Lipoxygenase (5LO) catalyses the first two steps in the biosynthesis of leukotrienes, which are inflammatory mediators derived from arachidonic acid. 5LO activity is stimulated by ATP; however, a consensus ATP-binding site or nucleotide-binding site has not been found in its protein sequence. In the present study, affinity and photoaffinity labelling of 5LO with 5'-p-fluorosulphonylbenzoyladenosine (FSBA) and 2-azido-ATP showed that 5LO bound to the ATP analogues quantitatively and specifically and that the incorporation of either analogue inhibited ATP stimulation of 5LO activity. The stoichiometry of the labelling was 1.4 mol of FSBA/mol of 5LO (of which ATP competed with 1 mol/mol) or 0.94 mol of 2-azido-ATP/mol of 5LO (of which ATP competed with 0.77 mol/mol). Labelling with FSBA prevented further labelling with 2-azido-ATP, indicating that the same binding site was occupied by both analogues. Other nucleotides (ADP, AMP, GTP, CTP and UTP) also competed with 2-azido-ATP labelling, suggesting that the site was a general nucleotide-binding site rather than a strict ATP-binding site. Ca(2+), which also stimulates 5LO activity, had no effect on the labelling of the nucleotide-binding site. Digestion with trypsin and peptide sequencing showed that two fragments of 5LO were labelled by 2-azido-ATP. These fragments correspond to residues 73-83 (KYWLNDDWYLK, in single-letter amino acid code) and 193-209 (FMHMFQSSWNDFADFEK) in the 5LO sequence. Trp-75 and Trp-201 in these peptides were modified by the labelling, suggesting that they were immediately adjacent to the C-2 position of the adenine ring of ATP. Given the stoichiometry of the labelling, the two peptide sequences of 5LO were probably near each other in the enzyme's tertiary structure, composing or surrounding the ATP-binding site of 5LO. PMID:11042125

  20. Hfq assists small RNAs in binding to the coding sequence of ompD mRNA and in rearranging its structure

    PubMed Central

    Wroblewska, Zuzanna; Olejniczak, Mikolaj

    2016-01-01

    The bacterial protein Hfq participates in the regulation of translation by small noncoding RNAs (sRNAs). Several mechanisms have been proposed to explain the role of Hfq in the regulation by sRNAs binding to the 5′-untranslated mRNA regions. However, it remains unknown how Hfq affects those sRNAs that target the coding sequence. Here, the contribution of Hfq to the annealing of three sRNAs, RybB, SdsR, and MicC, to the coding sequence of Salmonella ompD mRNA was investigated. Hfq bound to ompD mRNA with tight, subnanomolar affinity. Moreover, Hfq strongly accelerated the rates of annealing of RybB and MicC sRNAs to this mRNA, and it also had a small effect on the annealing of SdsR. The experiments using truncated RNAs revealed that the contributions of Hfq to the annealing of each sRNA were individually adjusted depending on the structures of interacting RNAs. In agreement with that, the mRNA structure probing revealed different structural contexts of each sRNA binding site. Additionally, the annealing of RybB and MicC sRNAs induced specific conformational changes in ompD mRNA consistent with local unfolding of mRNA secondary structure. Finally, the mutation analysis showed that the long AU-rich sequence in the 5′-untranslated mRNA region served as an Hfq binding site essential for the annealing of sRNAs to the coding sequence. Overall, the data showed that the functional specificity of Hfq in the annealing of each sRNA to the ompD mRNA coding sequence was determined by the sequence and structure of the interacting RNAs. PMID:27154968

  1. Complete Genome Sequence of a thermotolerant sporogenic lactic acid bacterium, Bacillus coagulans strain 36D1

    PubMed Central

    Rhee, Mun Su; Moritz, Brélan E.; Xie, Gary; Glavina del Rio, T.; Dalin, E.; Tice, H.; Bruce, D.; Goodwin, L.; Chertkov, O.; Brettin, T.; Han, C.; Detter, C.; Pitluck, S.; Land, Miriam L.; Patel, Milind; Ou, Mark; Harbrucker, Roberta; Ingram, Lonnie O.; Shanmugam, K. T.

    2011-01-01

    Bacillus coagulans is a ubiquitous soil bacterium that grows at 50-55 °C and pH 5.0 and ferments various sugars that constitute plant biomass to L (+)-lactic acid. The ability of this sporogenic lactic acid bacterium to grow at 50-55 °C and pH 5.0 makes this organism an attractive microbial biocatalyst for production of optically pure lactic acid at industrial scale not only from glucose derived from cellulose but also from xylose, a major constituent of hemicellulose. This bacterium is also considered as a potential probiotic. Complete genome sequence of a representative strain, B. coagulans strain 36D1, is presented and discussed. PMID:22675583

  2. BeadCons: detection of nucleic acid sequences by flow cytometry.

    PubMed

    Horejsh, Douglas; Martini, Federico; Capobianchi, Maria Rosaria

    2005-11-01

    Molecular beacons are single-stranded nucleic acid structures with a terminal fluorophore and a distal, terminal quencher. These molecules are typically used in real-time PCR assays, but have also been conjugated with solid matrices. This unit describes protocols related to molecular beacon-conjugated beads (BeadCons), whose specific hybridization with complementary target sequences can be resolved by cytometry. Assay sensitivity is achieved through the concentration of fluorescence signal on discrete particles. By using molecular beacons with different fluorophores and microspheres of different sizes, it is possible to construct a fluid array system with each bead corresponding to a specific target nucleic acid. Methods are presented for the design, construction, and use of BeadCons for the specific, multiplexed detection of unlabeled nucleic acids in solution. The use of bead-based detection methods will likely lead to the design of new multiplex molecular diagnostic tools.

  3. Measuring nanometer distances in nucleic acids using a sequence-independent nitroxide probe

    PubMed Central

    Qin, Peter Z; Haworth, Ian S; Cai, Qi; Kusnetzow, Ana K; Grant, Gian Paola G; Price, Eric A; Sowa, Glenna Z; Popova, Anna; Herreros, Bruno; He, Honghang

    2008-01-01

    This protocol describes the procedures for measuring nanometer distances in nucleic acids using a nitroxide probe that can be attached to any nucleotide within a given sequence. Two nitroxides are attached to phosphorothioates that are chemically substituted at specific sites of DNA or RNA. Inter-nitroxide distances are measured using a four-pulse double electron–electron resonance technique, and the measured distances are correlated to the parent structures using a Web-accessible computer program. Four to five days are needed for sample labeling, purification and distance measurement. The procedures described herein provide a method for probing global structures and studying conformational changes of nucleic acids and protein/nucleic acid complexes. PMID:17947978

  4. [Partial sequence homology of FtsZ in phylogenetics analysis of lactic acid bacteria].

    PubMed

    Zhang, Bin; Dong, Xiu-zhu

    2005-10-01

    FtsZ is a structurally conserved protein, which is universal among the prokaryotes. It plays a key role in prokaryote cell division. A partial fragment of the ftsZ gene about 800bp in length was amplified and sequenced and a partial FtsZ protein phylogenetic tree for the lactic acid bacteria was constructed. By comparing the FtsZ phylogenetic tree with the 16S rDNA tree, it was shown that the two trees were similar in topology. Both trees revealed that Pediococcus spp. were closely related with L. casei group of Lactobacillus spp. , but less related with other lactic acid cocci such as Enterococcus and Streptococcus. The results also showed that the discriminative power of FtsZ was higher than that of 16S rDNA for either inter-species or inter-genus and could be a very useful tool in species identification of lactic acid bacteria. PMID:16342751

  5. The amino acid sequence of Lady Amherst's pheasant (Chrysolophus amherstiae) and golden pheasant (Chrysolophus pictus) egg-white lysozymes.

    PubMed

    Araki, T; Kuramoto, M; Torikata, T

    1990-09-01

    The amino acids of Lady Amherst's pheasant and golden pheasant egg-white lysozymes have been sequenced. The carboxymethylated lysozymes were digested with trypsin followed by sequencing of the tryptic peptides. Lady Amherst's pheasant lysozyme proved to consist of 129 amino acid residues, and a relative molecular mass of 14,423 Da was calculated. This lysozyme had 6 amino acids substitutions when compared with hen egg-white lysozyme: Phe3 to Tyr, His15 to Leu, Gln41 to His, Asn77 to His, Gln 121 to Asn, and a newly found substitution of Ile124 to Thr. The amino acid sequence of golden pheasant lysozyme was identical to that of Lady Amherst's phesant lysozyme. The phylogenetic tree constructured by the comparison of amino acid sequences of phasianoid birds lysozymes revealed a minimum genetic distance between these pheasants and the turkey-peafowl group.

  6. The amino acid sequence of Lady Amherst's pheasant (Chrysolophus amherstiae) and golden pheasant (Chrysolophus pictus) egg-white lysozymes.

    PubMed

    Araki, T; Kuramoto, M; Torikata, T

    1990-09-01

    The amino acids of Lady Amherst's pheasant and golden pheasant egg-white lysozymes have been sequenced. The carboxymethylated lysozymes were digested with trypsin followed by sequencing of the tryptic peptides. Lady Amherst's pheasant lysozyme proved to consist of 129 amino acid residues, and a relative molecular mass of 14,423 Da was calculated. This lysozyme had 6 amino acids substitutions when compared with hen egg-white lysozyme: Phe3 to Tyr, His15 to Leu, Gln41 to His, Asn77 to His, Gln 121 to Asn, and a newly found substitution of Ile124 to Thr. The amino acid sequence of golden pheasant lysozyme was identical to that of Lady Amherst's phesant lysozyme. The phylogenetic tree constructured by the comparison of amino acid sequences of phasianoid birds lysozymes revealed a minimum genetic distance between these pheasants and the turkey-peafowl group. PMID:1368578

  7. Molecular cloning of the. alpha. -subunit of human prolyl 4-hydroxylase: The complete cDNA-derived amino acid sequence and evidence for alternative splicing of RNA transcripts

    SciTech Connect

    Helaakoski, T.; Vuori, K.; Myllylae, R.; Kivirikko, K.I.; Pihlajaniemi, T. )

    1989-06-01

    Prolyl 4-hydroxylase an {alpha}{sub 2}{beta}{sub 2} tetramer, catalyzes the formation of 4-hydroxyproline in collagens by the hydroxylation of proline residues in peptide linkages. The authors report here on the isolation of cDNA clones encoding the {alpha}-subunit of the enzyme from human tumor HT-1080, placenta, and fibroblast cDNA libraries. Eight overlapping clones covering almost all of the corresponding 3,000-nucleotide mRNA, including all the coding sequences, were characterized. These clones encode a polypeptide of 517 amino acid residues and a signal peptide of 17 amino acids. Previous characterization of cDNA clones for the {beta}-subunit of prolyl 4-hydroxylase has indicated that its C terminus has the amino acid sequence Lys-Asp-Gly-Leu, which, it has been suggested, is necessary for the retention of a polypeptide within the lumen of the endoplasmic reticulum. The {alpha}-subunit does not have this C-terminal sequence, and thus one function of the {beta}-subunit in the prolyl 4-hydroxylase tetramer appears to be to retain the enzyme within this cell organelle. Southern blot analyses of human genomic DNA with a cDNA probe for the {alpha}-subunit suggested the presence of only one gene encoding the two types of mRNA, which appear to result from mutually exclusive alternative splicing of primary transcripts of one gene.

  8. Individual variation of human S1P₁ coding sequence leads to heterogeneity in receptor function and drug interactions.

    PubMed

    Obinata, Hideru; Gutkind, Sarah; Stitham, Jeremiah; Okuno, Toshiaki; Yokomizo, Takehiko; Hwa, John; Hla, Timothy

    2014-12-01

    Sphingosine 1-phosphate receptor 1 (S1P₁), an abundantly-expressed G protein-coupled receptor which regulates key vascular and immune responses, is a therapeutic target in autoimmune diseases. Fingolimod/Gilenya (FTY720), an oral medication for relapsing-remitting multiple sclerosis, targets S1P₁ receptors on immune and neural cells to suppress neuroinflammation. However, suppression of endothelial S1P₁ receptors is associated with cardiac and vascular adverse effects. Here we report the genetic variations of the S1P₁ coding region from exon sequencing of >12,000 individuals and their functional consequences. We conducted functional analyses of 14 nonsynonymous single nucleotide polymorphisms (SNPs) of the S1PR1 gene. One SNP mutant (Arg¹²⁰ to Pro) failed to transmit sphingosine 1-phosphate (S1P)-induced intracellular signals such as calcium increase and activation of p44/42 MAPK and Akt. Two other mutants (Ile⁴⁵ to Thr and Gly³⁰⁵ to Cys) showed normal intracellular signals but impaired S1P-induced endocytosis, which made the receptor resistant to FTY720-induced degradation. Another SNP mutant (Arg¹³ to Gly) demonstrated protection from coronary artery disease in a high cardiovascular risk population. Individuals with this mutation showed a significantly