Science.gov

Sample records for acid sequence coded

  1. Phenolic acid esterases, coding sequences and methods

    DOEpatents

    Blum, David L.; Kataeva, Irina; Li, Xin-Liang; Ljungdahl, Lars G.

    2002-01-01

    Described herein are four phenolic acid esterases, three of which correspond to domains of previously unknown function within bacterial xylanases, from XynY and XynZ of Clostridium thermocellum and from a xylanase of Ruminococcus. The fourth specifically exemplified xylanase is a protein encoded within the genome of Orpinomyces PC-2. The amino acids of these polypeptides and nucleotide sequences encoding them are provided. Recombinant host cells, expression vectors and methods for the recombinant production of phenolic acid esterases are also provided.

  2. Cellulases and coding sequences

    DOEpatents

    Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong

    2001-01-01

    The present invention provides three fungal cellulases, their coding sequences, recombinant DNA molecules comprising the cellulase coding sequences, recombinant host cells and methods for producing same. The present cellulases are from Orpinomyces PC-2.

  3. Cellulases and coding sequences

    DOEpatents

    Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong

    2001-02-20

    The present invention provides three fungal cellulases, their coding sequences, recombinant DNA molecules comprising the cellulase coding sequences, recombinant host cells and methods for producing same. The present cellulases are from Orpinomyces PC-2.

  4. Lichenase and coding sequences

    DOEpatents

    Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong

    2000-08-15

    The present invention provides a fungal lichenase, i.e., an endo-1,3-1,4-.beta.-D-glucanohydrolase, its coding sequence, recombinant DNA molecules comprising the lichenase coding sequences, recombinant host cells and methods for producing same. The present lichenase is from Orpinomyces PC-2.

  5. Mutation-selection models of coding sequence evolution with site-heterogeneous amino acid fitness profiles.

    PubMed

    Rodrigue, Nicolas; Philippe, Hervé; Lartillot, Nicolas

    2010-03-01

    Modeling the interplay between mutation and selection at the molecular level is key to evolutionary studies. To this end, codon-based evolutionary models have been proposed as pertinent means of studying long-range evolutionary patterns and are widely used. However, these approaches have not yet consolidated results from amino acid level phylogenetic studies showing that selection acting on proteins displays strong site-specific effects, which translate into heterogeneous amino acid propensities across the columns of alignments; related codon-level studies have instead focused on either modeling a single selective context for all codon columns, or a separate selective context for each codon column, with the former strategy deemed too simplistic and the latter deemed overparameterized. Here, we integrate recent developments in nonparametric statistical approaches to propose a probabilistic model that accounts for the heterogeneity of amino acid fitness profiles across the coding positions of a gene. We apply the model to a dozen real protein-coding gene alignments and find it to produce biologically plausible inferences, for instance, as pertaining to site-specific amino acid constraints, as well as distributions of scaled selection coefficients. In their account of mutational features as well as the heterogeneous regimes of selection at the amino acid level, the modeling approaches studied here can form a backdrop for several extensions, accounting for other selective features, for variable population size, or for subtleties of mutational features, all with parameterizations couched within population-genetic theory. PMID:20176949

  6. An Interpretation of the Ancestral Codon from Miller’s Amino Acids and Nucleotide Correlations in Modern Coding Sequences

    PubMed Central

    Carels, Nicolas; de Leon, Miguel Ponce

    2015-01-01

    Purine bias, which is usually referred to as an “ancestral codon”, is known to result in short-range correlations between nucleotides in coding sequences, and it is common in all species. We demonstrate that RWY is a more appropriate pattern than the classical RNY, and purine bias (Rrr) is the product of a network of nucleotide compensations induced by functional constraints on the physicochemical properties of proteins. Through deductions from universal correlation properties, we also demonstrate that amino acids from Miller’s spark discharge experiment are compatible with functional primeval proteins at the dawn of living cell radiation on earth. These amino acids match the hydropathy and secondary structures of modern proteins. PMID:25922573

  7. Numerical classification of coding sequences

    NASA Technical Reports Server (NTRS)

    Collins, D. W.; Liu, C. C.; Jukes, T. H.

    1992-01-01

    DNA sequences coding for protein may be represented by counts of nucleotides or codons. A complete reading frame may be abbreviated by its base count, e.g. A76C158G121T74, or with the corresponding codon table, e.g. (AAA)0(AAC)1(AAG)9 ... (TTT)0. We propose that these numerical designations be used to augment current methods of sequence annotation. Because base counts and codon tables do not require revision as knowledge of function evolves, they are well-suited to act as cross-references, for example to identify redundant GenBank entries. These descriptors may be compared, in place of DNA sequences, to extract homologous genes from large databases. This approach permits rapid searching with good selectivity.

  8. Nucleotide sequence of the nifH gene coding for nitrogen reductase in the acetic acid bacterium Acetobacter diazotrophicus.

    PubMed

    Franke, I H; Fegan, M; Hayward, A C; Sly, L I

    1998-01-01

    The nifH gene sequence of the nitrogen-fixing bacterium Acetobacter diazotrophicus was determined with the use of the polymerase chain reaction and universal degenerate oligonucleotide primers. The gene shows highest pair-wise similarity to the nifH gene of Azospirillum brasilense. The phylogenetic relationships of the nifH gene sequences were compared with those inferred from 16S rRNA gene sequences. Knowledge of the sequence of the nifH gene contributes to the growing database of nifH gene sequences, and will allow the detection of Acet. diazotrophicus from environmental samples with nifH gene-based primers. PMID:9489028

  9. Ancient DNA sequence revealed by error-correcting codes.

    PubMed

    Brandão, Marcelo M; Spoladore, Larissa; Faria, Luzinete C B; Rocha, Andréa S L; Silva-Filho, Marcio C; Palazzo, Reginaldo

    2015-01-01

    A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code. PMID:26159228

  10. Ancient DNA sequence revealed by error-correcting codes

    PubMed Central

    Brandão, Marcelo M.; Spoladore, Larissa; Faria, Luzinete C. B.; Rocha, Andréa S. L.; Silva-Filho, Marcio C.; Palazzo, Reginaldo

    2015-01-01

    A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code. PMID:26159228

  11. Orpinomyces cellulase celf protein and coding sequences

    DOEpatents

    Li, Xin-Liang; Chen, Huizhong; Ljungdahl, Lars G.

    2000-09-05

    A cDNA (1,520 bp), designated celF, consisting of an open reading frame (ORF) encoding a polypeptide (CelF) of 432 amino acids was isolated from a cDNA library of the anaerobic rumen fungus Orpinomyces PC-2 constructed in Escherichia coli. Analysis of the deduced amino acid sequence showed that starting from the N-terminus, CelF consists of a signal peptide, a cellulose binding domain (CBD) followed by an extremely Asn-rich linker region which separate the CBD and the catalytic domains. The latter is located at the C-terminus. The catalytic domain of CelF is highly homologous to CelA and CelC of Orpinomyces PC-2, to CelA of Neocallimastix patriciarum and also to cellobiohydrolase IIs (CBHIIs) from aerobic fungi. However, Like CelA of Neocallimastix patriciarum, CelF does not have the noncatalytic repeated peptide domain (NCRPD) found in CelA and CelC from the same organism. The recombinant protein CelF hydrolyzes cellooligosaccharides in the pattern of CBHII, yielding only cellobiose as product with cellotetraose as the substrate. The genomic celF is interrupted by a 111 bp intron, located within the region coding for the CBD. The intron of the celF has features in common with genes from aerobic filamentous fungi.

  12. CROSS-DISCIPLINARY PHYSICS AND RELATED AREAS OF SCIENCE AND TECHNOLOGY: The structural analysis of protein sequences based on the quasi-amino acids code

    NASA Astrophysics Data System (ADS)

    Zhu, Ping; Tang, Xu-Qing; Xu, Zhen-Yuan

    2009-01-01

    Proteomics is the study of proteins and their interactions in a cell. With the successful completion of the Human Genome Project, it comes the postgenome era when the proteomics technology is emerging. This paper studies protein molecule from the algebraic point of view. The algebraic system (Σ, +, *) is introduced, where Σ is the set of 64 codons. According to the characteristics of (Σ, +, *), a novel quasi-amino acids code classification method is introduced and the corresponding algebraic operation table over the set ZU of the 16 kinds of quasi-amino acids is established. The internal relation is revealed about quasi-amino acids. The results show that there exist some very close correlations between the properties of the quasi-amino acids and the codon. All these correlation relationships may play an important part in establishing the logic relationship between codons and the quasi-amino acids during the course of life origination. According to Ma F et al (2003 J. Anhui Agricultural University 30 439), the corresponding relation and the excellent properties about amino acids code are very difficult to observe. The present paper shows that (ZU, ⊕, otimes) is a field. Furthermore, the operational results display that the codon tga has different property from other stop codons. In fact, in the mitochondrion from human and ox genomic codon, tga is just tryptophane, is not the stop codon like in other genetic code, it is the case of the Chen W C et al (2002 Acta Biophysica Sinica 18(1) 87). The present theory avoids some inexplicable events of the 20 kinds of amino acids code, in other words it solves the problem of 'the 64 codon assignments of mRNA to amino acids is probably completely wrong' proposed by Yang (2006 Progress in Modern Biomedicine 6 3).

  13. The multiple codes of nucleotide sequences.

    PubMed

    Trifonov, E N

    1989-01-01

    Nucleotide sequences carry genetic information of many different kinds, not just instructions for protein synthesis (triplet code). Several codes of nucleotide sequences are discussed including: (1) the translation framing code, responsible for correct triplet counting by the ribosome during protein synthesis; (2) the chromatin code, which provides instructions on appropriate placement of nucleosomes along the DNA molecules and their spatial arrangement; (3) a putative loop code for single-stranded RNA-protein interactions. The codes are degenerate and corresponding messages are not only interspersed but actually overlap, so that some nucleotides belong to several messages simultaneously. Tandemly repeated sequences frequently considered as functionless "junk" are found to be grouped into certain classes of repeat unit lengths. This indicates some functional involvement of these sequences. A hypothesis is formulated according to which the tandem repeats are given the role of weak enhancer-silencers that modulate, in a copy number-dependent way, the expression of proximal genes. Fast amplification and elimination of the repeats provides an attractive mechanism of species adaptation to a rapidly changing environment. PMID:2673451

  14. Nonspatial Sequence Coding in CA1 Neurons

    PubMed Central

    Allen, Timothy A.; Salz, Daniel M.; McKenzie, Sam

    2016-01-01

    The hippocampus is critical to the memory for sequences of events, a defining feature of episodic memory. However, the fundamental neuronal mechanisms underlying this capacity remain elusive. While considerable research indicates hippocampal neurons can represent sequences of locations, direct evidence of coding for the memory of sequential relationships among nonspatial events remains lacking. To address this important issue, we recorded neural activity in CA1 as rats performed a hippocampus-dependent sequence-memory task. Briefly, the task involves the presentation of repeated sequences of odors at a single port and requires rats to identify each item as “in sequence” or “out of sequence”. We report that, while the animals' location and behavior remained constant, hippocampal activity differed depending on the temporal context of items—in this case, whether they were presented in or out of sequence. Some neurons showed this effect across items or sequence positions (general sequence cells), while others exhibited selectivity for specific conjunctions of item and sequence position information (conjunctive sequence cells) or for specific probe types (probe-specific sequence cells). We also found that the temporal context of individual trials could be accurately decoded from the activity of neuronal ensembles, that sequence coding at the single-cell and ensemble level was linked to sequence memory performance, and that slow-gamma oscillations (20–40 Hz) were more strongly modulated by temporal context and performance than theta oscillations (4–12 Hz). These findings provide compelling evidence that sequence coding extends beyond the domain of spatial trajectories and is thus a fundamental function of the hippocampus. SIGNIFICANCE STATEMENT The ability to remember the order of life events depends on the hippocampus, but the underlying neural mechanisms remain poorly understood. Here we addressed this issue by recording neural activity in hippocampal

  15. CodingMotif: exact determination of overrepresented nucleotide motifs in coding sequences

    PubMed Central

    2012-01-01

    Background It has been increasingly appreciated that coding sequences harbor regulatory sequence motifs in addition to encoding for protein. These sequence motifs are expected to be overrepresented in nucleotide sequences bound by a common protein or small RNA. However, detecting overrepresented motifs has been difficult because of interference by constraints at the protein level. Sampling-based approaches to solve this problem based on codon-shuffling have been limited to exploring only an infinitesimal fraction of the sequence space and by their use of parametric approximations. Results We present a novel O(N(log N)2)-time algorithm, CodingMotif, to identify nucleotide-level motifs of unusual copy number in protein-coding regions. Using a new dynamic programming algorithm we are able to exhaustively calculate the distribution of the number of occurrences of a motif over all possible coding sequences that encode the same amino acid sequence, given a background model for codon usage and dinucleotide biases. Our method takes advantage of the sparseness of loci where a given motif can occur, greatly speeding up the required convolution calculations. Knowledge of the distribution allows one to assess the exact non-parametric p-value of whether a given motif is over- or under- represented. We demonstrate that our method identifies known functional motifs more accurately than sampling and parametric-based approaches in a variety of coding datasets of various size, including ChIP-seq data for the transcription factors NRSF and GABP. Conclusions CodingMotif provides a theoretically and empirically-demonstrated advance for the detection of motifs overrepresented in coding sequences. We expect CodingMotif to be useful for identifying motifs in functional genomic datasets such as DNA-protein binding, RNA-protein binding, or microRNA-RNA binding within coding regions. A software implementation is available at http://bioinformatics.bc.edu/chuanglab/codingmotif.tar PMID

  16. Short sequence motifs, overrepresented in mammalian conservednon-coding sequences

    SciTech Connect

    Minovitsky, Simon; Stegmaier, Philip; Kel, Alexander; Kondrashov,Alexey S.; Dubchak, Inna

    2007-02-21

    Background: A substantial fraction of non-coding DNAsequences of multicellular eukaryotes is under selective constraint. Inparticular, ~;5 percent of the human genome consists of conservednon-coding sequences (CNSs). CNSs differ from other genomic sequences intheir nucleotide composition and must play important functional roles,which mostly remain obscure.Results: We investigated relative abundancesof short sequence motifs in all human CNSs present in the human/mousewhole-genome alignments vs. three background sets of sequences: (i)weakly conserved or unconserved non-coding sequences (non-CNSs); (ii)near-promoter sequences (located between nucleotides -500 and -1500,relative to a start of transcription); and (iii) random sequences withthe same nucleotide composition as that of CNSs. When compared tonon-CNSs and near-promoter sequences, CNSs possess an excess of AT-richmotifs, often containing runs of identical nucleotides. In contrast, whencompared to random sequences, CNSs contain an excess of GC-rich motifswhich, however, lack CpG dinucleotides. Thus, abundance of short sequencemotifs in human CNSs, taken as a whole, is mostly determined by theiroverall compositional properties and not by overrepresentation of anyspecific short motifs. These properties are: (i) high AT-content of CNSs,(ii) a tendency, probably due to context-dependent mutation, of A's andT's to clump, (iii) presence of short GC-rich regions, and (iv) avoidanceof CpG contexts, due to their hypermutability. Only a small number ofshort motifs, overrepresented in all human CNSs are similar to bindingsites of transcription factors from the FOX family.Conclusion: Human CNSsas a whole appear to be too broad a class of sequences to possess strongfootprints of any short sequence-specific functions. Such footprintsshould be studied at the level of functional subclasses of CNSs, such asthose which flank genes with a particular pattern of expression. Overallproperties of CNSs are affected by patterns in

  17. Composition for nucleic acid sequencing

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2008-08-26

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  18. Coding visual features extracted from video sequences.

    PubMed

    Baroffio, Luca; Cesana, Matteo; Redondi, Alessandro; Tagliasacchi, Marco; Tubaro, Stefano

    2014-05-01

    Visual features are successfully exploited in several applications (e.g., visual search, object recognition and tracking, etc.) due to their ability to efficiently represent image content. Several visual analysis tasks require features to be transmitted over a bandwidth-limited network, thus calling for coding techniques to reduce the required bit budget, while attaining a target level of efficiency. In this paper, we propose, for the first time, a coding architecture designed for local features (e.g., SIFT, SURF) extracted from video sequences. To achieve high coding efficiency, we exploit both spatial and temporal redundancy by means of intraframe and interframe coding modes. In addition, we propose a coding mode decision based on rate-distortion optimization. The proposed coding scheme can be conveniently adopted to implement the analyze-then-compress (ATC) paradigm in the context of visual sensor networks. That is, sets of visual features are extracted from video frames, encoded at remote nodes, and finally transmitted to a central controller that performs visual analysis. This is in contrast to the traditional compress-then-analyze (CTA) paradigm, in which video sequences acquired at a node are compressed and then sent to a central unit for further processing. In this paper, we compare these coding paradigms using metrics that are routinely adopted to evaluate the suitability of visual features in the context of content-based retrieval, object recognition, and tracking. Experimental results demonstrate that, thanks to the significant coding gains achieved by the proposed coding scheme, ATC outperforms CTA with respect to all evaluation metrics. PMID:24818244

  19. High compression image and image sequence coding

    NASA Technical Reports Server (NTRS)

    Kunt, Murat

    1989-01-01

    The digital representation of an image requires a very large number of bits. This number is even larger for an image sequence. The goal of image coding is to reduce this number, as much as possible, and reconstruct a faithful duplicate of the original picture or image sequence. Early efforts in image coding, solely guided by information theory, led to a plethora of methods. The compression ratio reached a plateau around 10:1 a couple of years ago. Recent progress in the study of the brain mechanism of vision and scene analysis has opened new vistas in picture coding. Directional sensitivity of the neurones in the visual pathway combined with the separate processing of contours and textures has led to a new class of coding methods capable of achieving compression ratios as high as 100:1 for images and around 300:1 for image sequences. Recent progress on some of the main avenues of object-based methods is presented. These second generation techniques make use of contour-texture modeling, new results in neurophysiology and psychophysics and scene analysis.

  20. The Cipher Code of Simple Sequence Repeats in "Vampire Pathogens".

    PubMed

    Zou, Geng; Bello-Orti, Bernardo; Aragon, Virginia; Tucker, Alexander W; Luo, Rui; Ren, Pinxing; Bi, Dingren; Zhou, Rui; Jin, Hui

    2015-01-01

    Blood inside mammals is a forbidden area for the majority of prokaryotic microbes; however, red blood cells tropism microbes, like "vampire pathogens" (VP), succeed in matching scarce nutrients and surviving strong immunity reactions. Here, we found VP of Mycoplasma, Rhizobiales, and Rickettsiales showed significantly higher counts of (AG)n dimeric simple sequence repeats (Di-SSRs) in the genomes, coding and non-coding regions than non Vampire Pathogens (N_VP). Regression analysis indicated a significant correlation between GC content and the span of (AG)n-Di-SSR variation. Gene Ontology (GO) terms with abundance of (AG)3-Di-SSRs shared by the VP strains were associated with purine nucleotide metabolism (FDR < 0.01), indicating an adaptation to the limited availability of purine and nucleotide precursors in blood. Di-amino acids coded by (AG)n-Di-SSRs included all three six-fold code amino acids (Arg, Leu and Ser) and significantly higher counts of Di-amino acids coded by (AG)3, (GA)3, and (TC)3 in VP than N_VP. Furthermore, significant differences (P < 0.001) on the numbers of triplexes formed from (AG)n-Di-SSRs between VP and N_VP in Mycoplasma suggested the potential role of (AG)n-Di-SSRs in gene regulation. PMID:26215592

  1. High speed nucleic acid sequencing

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2011-05-17

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid. Each type of labeled nucleotide comprises an acceptor fluorophore attached to a phosphate portion of the nucleotide such that the fluorophore is removed upon incorporation into a growing strand. Fluorescent signal is emitted via fluorescent resonance energy transfer between the donor fluorophore and the acceptor fluorophore as each nucleotide is incorporated into the growing strand. The sequence is deduced by identifying which base is being incorporated into the growing strand.

  2. Whole Genome Sequencing: Cracking the Genetic Code for Foodborne Illness

    MedlinePlus

    ... Consumers Consumer Updates Whole Genome Sequencing: Cracking the Genetic Code for Foodborne Illness Share Tweet Linkedin Pin ... have millions of different genomes, or sequences of genetic code, each as unique as a fingerprint. Get ...

  3. Hybrid ARQ schemes employing coded modulation and sequence combining

    NASA Astrophysics Data System (ADS)

    Deng, Robert H.

    1994-06-01

    We propose and analyze two hybrid automatic-repeat-request (ARQ) schemes employing bandwidth efficient coded modulation and coded sequence combining. In the first scheme, a trellis-coded modulation (TCM) is used to control channel noise; while in the second scheme a concatenated coded modulation is employed. The concatenated coded modulation is formed by cascading a Reed-Solomon (RS) outer code and a coded modulation (BCM) inner code. In both schemes, the coded modulation decoder, by performing sequence combining and soft-decision maximum likelihood decoding, makes full use of the information available in all received sequences corresponding to a given information message. It is shown, by means of analysis as well as computer simulations, that both schemes are capable of providing high throughput efficiencies over a wide range of signal-to-noise ratios. The schemes are suitable for large file transfers over satellite communication links where high throughput and high reliability are required.

  4. SEQassembly: A Practical Tools Program for Coding Sequences Splicing

    NASA Astrophysics Data System (ADS)

    Lee, Hongbin; Yang, Hang; Fu, Lei; Qin, Long; Li, Huili; He, Feng; Wang, Bo; Wu, Xiaoming

    CDS (Coding Sequences) is a portion of mRNA sequences, which are composed by a number of exon sequence segments. The construction of CDS sequence is important for profound genetic analysis such as genotyping. A program in MATLAB environment is presented, which can process batch of samples sequences into code segments under the guide of reference exon models, and splice these code segments of same sample source into CDS according to the exon order in queue file. This program is useful in transcriptional polymorphism detection and gene function study.

  5. Using Huffman coding method to visualize and analyze DNA sequences.

    PubMed

    Qi, Zhao-Hui; Li, Ling; Qi, Xiao-Qin

    2011-11-30

    On the basis of the Huffman coding method, we propose a new graphical representation of DNA sequence. The representation can avoid degeneracy and loss of information in the transfer of data from a DNA sequence to its graphical representation. Then a multicomponent vector from the representation is introduced to characterize quantitatively DNA sequences. The components of the vector are derived from the graphical representation of DNA primary sequence. The examination of similarities and dissimilarities among the complete coding sequences of β-globin gene of 11 species and six ND6 proteins shows the utility of the scheme. PMID:21953557

  6. Variation in Seed Fatty Acid Composition, and Sequence Divergence in the FAD2 Gene Coding Region between Wild and Cultivated Sesame

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Sesame germplasm harbors genetic diversity which can be useful for sesame improvement in breeding programs. Seven accessions with different levels of oleic acid were selected from the entire USDA sesame germplasm collection (1232 accessions) and planted for morphological observation and re-examinati...

  7. Amino acid fermentation at the origin of the genetic code

    PubMed Central

    2012-01-01

    There is evidence that the genetic code was established prior to the existence of proteins, when metabolism was powered by ribozymes. Also, early proto-organisms had to rely on simple anaerobic bioenergetic processes. In this work I propose that amino acid fermentation powered metabolism in the RNA world, and that this was facilitated by proto-adapters, the precursors of the tRNAs. Amino acids were used as carbon sources rather than as catalytic or structural elements. In modern bacteria, amino acid fermentation is known as the Stickland reaction. This pathway involves two amino acids: the first undergoes oxidative deamination, and the second acts as an electron acceptor through reductive deamination. This redox reaction results in two keto acids that are employed to synthesise ATP via substrate-level phosphorylation. The Stickland reaction is the basic bioenergetic pathway of some bacteria of the genus Clostridium. Two other facts support Stickland fermentation in the RNA world. First, several Stickland amino acid pairs are synthesised in abiotic amino acid synthesis. This suggests that amino acids that could be used as an energy substrate were freely available. Second, anticodons that have complementary sequences often correspond to amino acids that form Stickland pairs. The main hypothesis of this paper is that pairs of complementary proto-adapters were assigned to Stickland amino acids pairs. There are signatures of this hypothesis in the genetic code. Furthermore, it is argued that the proto-adapters formed double strands that brought amino acid pairs into proximity to facilitate their mutual redox reaction, structurally constraining the anticodon pairs that are assigned to these amino acid pairs. Significance tests which randomise the code are performed to study the extent of the variability of the energetic (ATP) yield. Random assignments can lead to a substantial yield of ATP and maintain enough variability, thus selection can act and refine the assignments

  8. Chip-based sequencing nucleic acids

    DOEpatents

    Beer, Neil Reginald

    2014-08-26

    A system for fast DNA sequencing by amplification of genetic material within microreactors, denaturing, demulsifying, and then sequencing the material, while retaining it in a PCR/sequencing zone by a magnetic field. One embodiment includes sequencing nucleic acids on a microchip that includes a microchannel flow channel in the microchip. The nucleic acids are isolated and hybridized to magnetic nanoparticles or to magnetic polystyrene-coated beads. Microreactor droplets are formed in the microchannel flow channel. The microreactor droplets containing the nucleic acids and the magnetic nanoparticles are retained in a magnetic trap in the microchannel flow channel and sequenced.

  9. FRAGS: estimation of coding sequence substitution rates from fragmentary data

    PubMed Central

    Swart, Estienne C; Hide, Winston A; Seoighe, Cathal

    2004-01-01

    Background Rates of substitution in protein-coding sequences can provide important insights into evolutionary processes that are of biomedical and theoretical interest. Increased availability of coding sequence data has enabled researchers to estimate more accurately the coding sequence divergence of pairs of organisms. However the use of different data sources, alignment protocols and methods to estimate substitution rates leads to widely varying estimates of key parameters that define the coding sequence divergence of orthologous genes. Although complete genome sequence data are not available for all organisms, fragmentary sequence data can provide accurate estimates of substitution rates provided that an appropriate and consistent methodology is used and that differences in the estimates obtainable from different data sources are taken into account. Results We have developed FRAGS, an application framework that uses existing, freely available software components to construct in-frame alignments and estimate coding substitution rates from fragmentary sequence data. Coding sequence substitution estimates for human and chimpanzee sequences, generated by FRAGS, reveal that methodological differences can give rise to significantly different estimates of important substitution parameters. The estimated substitution rates were also used to infer upper-bounds on the amount of sequencing error in the datasets that we have analysed. Conclusion We have developed a system that performs robust estimation of substitution rates for orthologous sequences from a pair of organisms. Our system can be used when fragmentary genomic or transcript data is available from one of the organisms and the other is a completely sequenced genome within the Ensembl database. As well as estimating substitution statistics our system enables the user to manage and query alignment and substitution data. PMID:15005802

  10. Utilizing sequence intrinsic composition to classify protein-coding and long non-coding transcripts.

    PubMed

    Sun, Liang; Luo, Haitao; Bu, Dechao; Zhao, Guoguang; Yu, Kuntao; Zhang, Changhai; Liu, Yuanning; Chen, Runsheng; Zhao, Yi

    2013-09-01

    It is a challenge to classify protein-coding or non-coding transcripts, especially those re-constructed from high-throughput sequencing data of poorly annotated species. This study developed and evaluated a powerful signature tool, Coding-Non-Coding Index (CNCI), by profiling adjoining nucleotide triplets to effectively distinguish protein-coding and non-coding sequences independent of known annotations. CNCI is effective for classifying incomplete transcripts and sense-antisense pairs. The implementation of CNCI offered highly accurate classification of transcripts assembled from whole-transcriptome sequencing data in a cross-species manner, that demonstrated gene evolutionary divergence between vertebrates, and invertebrates, or between plants, and provided a long non-coding RNA catalog of orangutan. CNCI software is available at http://www.bioinfo.org/software/cnci. PMID:23892401

  11. Correlation approach to identify coding regions in DNA sequences

    NASA Technical Reports Server (NTRS)

    Ossadnik, S. M.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1994-01-01

    Recently, it was observed that noncoding regions of DNA sequences possess long-range power-law correlations, whereas coding regions typically display only short-range correlations. We develop an algorithm based on this finding that enables investigators to perform a statistical analysis on long DNA sequences to locate possible coding regions. The algorithm is particularly successful in predicting the location of lengthy coding regions. For example, for the complete genome of yeast chromosome III (315,344 nucleotides), at least 82% of the predictions correspond to putative coding regions; the algorithm correctly identified all coding regions larger than 3000 nucleotides, 92% of coding regions between 2000 and 3000 nucleotides long, and 79% of coding regions between 1000 and 2000 nucleotides. The predictive ability of this new algorithm supports the claim that there is a fundamental difference in the correlation property between coding and noncoding sequences. This algorithm, which is not species-dependent, can be implemented with other techniques for rapidly and accurately locating relatively long coding regions in genomic sequences.

  12. Molecular evolution of coding and non-coding sequences of the growth hormone receptor (GHR) gene in the family Bovidae.

    PubMed

    Maj, Andrzej; Zwierzchowski, Lech

    2006-01-01

    The GHR gene exon 1A and exon 4 with fragments of its flanking introns were sequenced in twelve Bovidae species and the obtained sequences were aligned and analysed by the ClustalW method. In coding exon 4 only three interspecies differences were found, one of which had an effect on the amino-acid sequence--leucine 152 proline. The average mutation frequency in non-coding exon 1A was 10.5 per 100 bp, and was 4.6-fold higher than that in coding exon 4 (2.3 per 100 bp). The mutation frequency in intron sequences was similar to that in non-coding exon 1A (8.9 vs 10.5/100 bp). For non-coding exon 1A, the mutation levels were lower within than between the subfamilies Bovinae and Caprinae. Exon 4 was 100% identical within the genera Ovis, Capra, Bison, and Bos and 97.7% identical for Ovis moschatus, Ammotragus lervia and Bovinae species. The identity level of non-coding exon 1A of the GHR gene was 93.8% between species belonging to Bovinae and Caprinae. The average mutation rate was 0.2222/100 bp/MY and 0.0513/100 bp/MY for the Bovidae GHR gene exons 1A and 4, respectively. Thus, the GHR gene is well conserved in the Bovidae family. Also, in this study some novel intraspecies polymorphisms were found for cattle and sheep. PMID:17044257

  13. Distinguishing Proteins From Arbitrary Amino Acid Sequences

    PubMed Central

    Yau, Stephen S.-T.; Mao, Wei-Guang; Benson, Max; He, Rong Lucy

    2015-01-01

    What kinds of amino acid sequences could possibly be protein sequences? From all existing databases that we can find, known proteins are only a small fraction of all possible combinations of amino acids. Beginning with Sanger's first detailed determination of a protein sequence in 1952, previous studies have focused on describing the structure of existing protein sequences in order to construct the protein universe. No one, however, has developed a criteria for determining whether an arbitrary amino acid sequence can be a protein. Here we show that when the collection of arbitrary amino acid sequences is viewed in an appropriate geometric context, the protein sequences cluster together. This leads to a new computational test, described here, that has proved to be remarkably accurate at determining whether an arbitrary amino acid sequence can be a protein. Even more, if the results of this test indicate that the sequence can be a protein, and it is indeed a protein sequence, then its identity as a protein sequence is uniquely defined. We anticipate our computational test will be useful for those who are attempting to complete the job of discovering all proteins, or constructing the protein universe. PMID:25609314

  14. High-speed Viterbi decoding with overlapping code sequences

    NASA Technical Reports Server (NTRS)

    Ross, Michael D.; Osborne, William P.

    1993-01-01

    The Viterbi Algorithm for decoding convolutional codes and Trellis Coded Modulation is suited to VLSI implementation but contains a feedback loop which limits the speed of pipelined architecture. The feedback loop is circumvented by decoding independent sequences simultaneously, resulting in a 5-9 fold speed-up with a two-fold hardware expansion.

  15. Method for sequencing nucleic acid molecules

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2006-05-30

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  16. Method for sequencing nucleic acid molecules

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2006-06-06

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  17. An operational RNA code for amino acids and possible relationship to genetic code.

    PubMed Central

    Schimmel, P; Giegé, R; Moras, D; Yokoyama, S

    1993-01-01

    RNA helical oligonucleotides that recapitulate the acceptor stems of transfer RNAs, and that are devoid of the anticodon trinucleotides of the genetic code, are aminoacylated by aminoacyl tRNA synthetases. The specificity of aminoacylation is sequence dependent, and both specificity and efficiency are generally determined by only a few nucleotides proximal to the amino acid attachment site. This sequence/structure-dependent aminoacylation of RNA oligonucleotides constitutes an operational RNA code for amino acids. To a rough approximation, members of the two different classes of tRNA synthetases are, like tRNAs, organized into two major domains. The class-defining conserved domain containing the active site incorporates determinants for recognition of RNA mini-helix substrates. This domain may reflect the primordial synthetase, which was needed for expression of the operational RNA code. The second synthetase domain, which generally is less or not conserved, provides for interactions with the second domain of tRNA, which incorporates the anticodon. The emergence of the genetic from the operational RNA code could occur when the second domain of synthetases was added with the anticodon-containing domain of tRNAs. Images Fig. 2 Fig. 3 Fig. 4 PMID:7692438

  18. Detecting frame shifts by amino acid sequence comparison.

    PubMed

    Claverie, J M

    1993-12-20

    Various amino acid substitution scoring matrices are used in conjunction with local alignments programs to detect regions of similarity and infer potential common ancestry between proteins. The usual scoring schemes derive from the implicit hypothesis that related proteins evolve from a common ancestor by the accumulation of point mutations and that amino acids tend to be progressively substituted by others with similar properties. However, other frequent single mutation events, like nucleotide insertion or deletion and gene inversion, change the translation reading frame and cause previously encoded amino acid sequences to become unrecognizable at once. Here, I derive five new types of scoring matrix, each capable of detecting a specific frame shift (deletion, insertion and inversion in 3 frames) and use them with a regular local alignments program to detect amino acid sequences that may have derived from alternative reading frames of the same nucleotide sequence. Frame shifts are inferred from the sole comparison of the protein sequences. The five scoring matrices were used with the BLASTP program to compare all the protein sequences in the Swissprot database. Surprisingly, the searches revealed hundreds of highly significant frame shift matches, of which many are likely to represent sequencing errors. Others provide some evidence that frame shift mutations might be used in protein evolution as a way to create new amino acid sequences from pre-existing coding regions. PMID:7903399

  19. Sequence of the Ampullariella sp. strain 3876 gene coding for xylose isomerase.

    PubMed Central

    Saari, G C; Kumar, A A; Kawasaki, G H; Insley, M Y; O'Hara, P J

    1987-01-01

    The nucleotide sequence of the gene coding for xylose isomerase from Ampullariella sp. strain 3876, a gram-positive bacterium, has been determined. A clone of a fragment of strain 3876 DNA coding for a xylose isomerase activity was identified by its ability to complement a xylose isomerase-defective Escherichia coli strain. One such complementation positive fragment, 2,922 nucleotides in length, was sequenced in its entirety. There are two open reading frames 1,182 and 1,242 nucleotides in length, on opposite strands of this fragment, each of which could code for a protein the expected size of xylose isomerase. The 1,182-nucleotide open reading frame was identified as the coding sequence for the protein from the sequence analysis of the amino-terminal region and selected internal peptides. The gene initiates with GTG and has a high guanine and cytosine content (70%) and an exceptionally strong preference (97%) for guanine or cytosine in the third position of the codons. The gene codes for a 43,210-dalton polypeptide composed of 393 amino acids. The xylose isomerase from Ampullariella sp. strain 3876 is similar in size to other bacterial xylose isomerases and has limited amino acid sequence homology to the available sequences from E. coli, Bacillus subtilis, and Streptomyces violaceus-ruber. In all cases yet studied, the bacterial gene for xylulose kinase is downstream from the gene for xylose isomerase. We present evidence suggesting that in Ampullariella sp. strain 3876 these genes are similarly arranged. PMID:3027039

  20. The Coding and Inter-Manual Transfer of Movement Sequences

    PubMed Central

    Shea, Charles H.; Kovacs, Attila J.; Panzer, Stefan

    2011-01-01

    The manuscript reviews recent experiments that use inter-manual transfer and inter-manual practice paradigms to determine the coordinate system (visual–spatial or motor) used in the coding of movement sequences during physical and observational practice. The results indicated that multi-element movement sequences are more effectively coded in visual–spatial coordinates even following extended practice, while very early in practice movement sequences with only a few movement elements and relatively short durations are coded in motor coordinates. Likewise, inter-manual practice of relatively simple movement sequences show benefits of right and left limb practice that involves the same motor coordinates while the opposite is true for more complex sequences. The results suggest that the coordinate system used to code the sequence information is linked to both the task characteristics and the control processes used to produce the sequence. These findings have the potential to greatly enhance our understanding of why in some conditions participants following practice with one limb or observation of one limb practice can effectively perform the task with the contralateral limb while in other (often similar) conditions cannot. PMID:21716583

  1. Streamlined Genome Sequence Compression using Distributed Source Coding

    PubMed Central

    Wang, Shuang; Jiang, Xiaoqian; Chen, Feng; Cui, Lijuan; Cheng, Samuel

    2014-01-01

    We aim at developing a streamlined genome sequence compression algorithm to support alternative miniaturized sequencing devices, which have limited communication, storage, and computation power. Existing techniques that require heavy client (encoder side) cannot be applied. To tackle this challenge, we carefully examined distributed source coding theory and developed a customized reference-based genome compression protocol to meet the low-complexity need at the client side. Based on the variation between source and reference, our protocol will pick adaptively either syndrome coding or hash coding to compress subsequences of changing code length. Our experimental results showed promising performance of the proposed method when compared with the state-of-the-art algorithm (GRS). PMID:25520552

  2. ICDS database: interrupted CoDing sequences in prokaryotic genomes.

    PubMed

    Perrodou, Emmanuel; Deshayes, Caroline; Muller, Jean; Schaeffer, Christine; Van Dorsselaer, Alain; Ripp, Raymond; Poch, Olivier; Reyrat, Jean-Marc; Lecompte, Odile

    2006-01-01

    Unrecognized frameshifts, in-frame stop codons and sequencing errors lead to Interrupted CoDing Sequence (ICDS) that can seriously affect all subsequent steps of functional characterization, from in silico analysis to high-throughput proteomic projects. Here, we describe the Interrupted CoDing Sequence database containing ICDS detected by a similarity-based approach in 80 complete prokaryotic genomes. ICDS can be retrieved by species browsing or similarity searches via a web interface (http://www-bio3d-igbmc.u-strasbg.fr/ICDS/). The definition of each interrupted gene is provided as well as the ICDS genomic localization with the surrounding sequence. Furthermore, to facilitate the experimental characterization of ICDS, we propose optimized primers for re-sequencing purposes. The database will be regularly updated with additional data from ongoing sequenced genomes. Our strategy has been validated by three independent tests: (i) ICDS prediction on a benchmark of artificially created frameshifts, (ii) comparison of predicted ICDS and results obtained from the comparison of the two genomic sequences of Bacillus licheniformis strain ATCC 14580 and (iii) re-sequencing of 25 predicted ICDS of the recently sequenced genome of Mycobacterium smegmatis. This allows us to estimate the specificity and sensitivity (95 and 82%, respectively) of our program and the efficiency of primer determination. PMID:16381882

  3. Statistical analysis of the distribution of amino acids in Borrelia burgdorferi genome under different genetic codes

    NASA Astrophysics Data System (ADS)

    García, José A.; Alvarez, Samantha; Flores, Alejandro; Govezensky, Tzipe; Bobadilla, Juan R.; José, Marco V.

    2004-10-01

    The genetic code is considered to be universal. In order to test if some statistical properties of the coding bacterial genome were due to inherent properties of the genetic code, we compared the autocorrelation function, the scaling properties and the maximum entropy of the distribution of distances of amino acids in sequences obtained by translating protein-coding regions from the genome of Borrelia burgdorferi, under different genetic codes. Overall our results indicate that these properties are very stable to perturbations made by altering the genetic code. We also discuss the evolutionary likely implications of the present results.

  4. The primordial sequence, ribosomes, and the genetic code.

    NASA Technical Reports Server (NTRS)

    Fox, S. W.; Yuki, A.; Waehneldt, T. V.; Lacey, J. C., Jr.

    1971-01-01

    Experimental investigation of the key question of the origin of life concerning the chronological order in the primordial sequence of nucleic acid, protein, and cell. It is pointed out that, when viewed against the background of experiments on the selective reaction of basic homopolyamine acids with mononucleotides (Lacey and Pruitt, 1969; Woese, 1968), the experiments made help to establish a basis for understanding how information originally flowed from proteins to nucleic acids.

  5. Machine-Checked Sequencer for Critical Embedded Code Generator

    NASA Astrophysics Data System (ADS)

    Izerrouken, Nassima; Pantel, Marc; Thirioux, Xavier

    This paper presents the development of a correct-by-construction block sequencer for GeneAuto a qualifiable (according to DO178B/ED12B recommendation) automatic code generator. It transforms Simulink models to MISRA C code for safety critical systems. Our approach which combines classical development process and formal specification and verification using proof-assistants, led to preliminary fruitful exchanges with certification authorities. We present parts of the classical user and tools requirements and derived formal specifications, implementation and verification for the correctness and termination of the block sequencer. This sequencer has been successfully applied to real-size industrial use cases from various transportation domain partners and led to requirement errors detection and a correct-by-construction implementation.

  6. Radio frequency interference effect on PN code sequence lock detector

    NASA Technical Reports Server (NTRS)

    Kwon, Hyuck M.; Tu, Kwei; Loh, Y. C.

    1991-01-01

    The authors find the probabilities of detection and false alarm of the pseudonoise (PN) sequence code lock detector when strong radio frequency interference (RFI) hits the communications link. Both a linear model and a soft-limiter nonlinear model for a transponder receiver are considered. In addition, both continuous wave (CW) RFI and pulse RFI are analyzed, and a discussion is included of how strong CW RFI can knock out the PN code lock detector in a linear or a soft-limiter transponder. As an example, the Space Station Freedom forward S-band PN system is evaluated. It is shown that a soft-limiter transponder can protect the PN code lock detector against a typical pulse RFI, but it can degrade the PN code lock detector performance more than a linear transponder if CW RFI hits the link.

  7. Amino-Acid Sequence of Porcine Pepsin

    PubMed Central

    Tang, J.; Sepulveda, P.; Marciniszyn, J.; Chen, K. C. S.; Huang, W-Y.; Tao, N.; Liu, D.; Lanier, J. P.

    1973-01-01

    As the culmination of several years of experiments, we propose a complete amino-acid sequence for porcine pepsin, an enzyme containing 327 amino-acid residues in a single polypeptide chain. In the sequence determination, the enzyme was treated with cyanogen bromide. Five resulting fragments were purified. The amino-acid sequence of four of the fragments accounted for 290 residues. Because the structure of a 37-residue carboxyl-terminal fragment was already known, it was not studied. The alignment of these fragments was determined from the sequence of methionyl-peptides we had previously reported. We also discovered the locations of activesite aspartyl residues, as well as the pairing of the three disulfide bridges. A minor component of commercial crystalline pepsin was found to contain two extra amino-acid residues, Ala-Leu-, at the amino-terminus of the molecule. This minor component was apparently derived from a different site of cleavage during the activation of porcine pepsinogen. PMID:4587252

  8. Evolution of codes, crosstalk, and sequence niches in biomolecular signaling

    NASA Astrophysics Data System (ADS)

    Myers, Christopher

    2007-03-01

    Signaling and regulation in cellular networks is mediated through biomolecular interactions, which can be somewhat promiscuous, involving the molecular recognition of broad sets of binding targets. This leads to some basic questions concerning crosstalk among similar sets of biomolecules: does it occur, to what extent can it be avoided, how can phenotypic errors due to crosstalk be minimized, and when might crosstalk be advantageous? Beyond biology, questions of this sort have connections to phase transitions in constraint satisfaction problems, and to the theory of message coding in noisy channels. Expanding upon my previous work exploring the nature of the satisfiability (SAT-UNSAT) transition in a simple model of protein-protein interactions, this talk will investigate the role of sequence evolution in shaping high-dimensional sequence niches and biomolecular codes.

  9. Sequence and Structural Analyses for Functional Non-coding RNAs

    NASA Astrophysics Data System (ADS)

    Sakakibara, Yasubumi; Sato, Kengo

    Analysis and detection of functional RNAs are currently important topics in both molecular biology and bioinformatics research. Several computational methods based on stochastic context-free grammars (SCFGs) have been developed for modeling and analysing functional RNA sequences. These grammatical methods have succeeded in modeling typical secondary structures of RNAs and are used for structural alignments of RNA sequences. Such stochastic models, however, are not sufficient to discriminate member sequences of an RNA family from non-members, and hence to detect non-coding RNA regions from genome sequences. Recently, the support vector machine (SVM) and kernel function techniques have been actively studied and proposed as a solution to various problems in bioinformatics. SVMs are trained from positive and negative samples and have strong, accurate discrimination abilities, and hence are more appropriate for the discrimination tasks. A few kernel functions that extend the string kernel to measure the similarity of two RNA sequences from the viewpoint of secondary structures have been proposed. In this article, we give an overview of recent progress in SCFG-based methods for RNA sequence analysis and novel kernel functions tailored to measure the similarity of two RNA sequences and developed for use with support vector machines (SVM) in discriminating members of an RNA family from non-members.

  10. Method for identifying and quantifying nucleic acid sequence aberrations

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    1998-01-01

    A method for detecting nucleic acid sequence aberrations by detecting nucleic acid sequences having both a first and a second nucleic acid sequence type, the presence of the first and second sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. The method uses a first hybridization probe which includes a nucleic acid sequence that is complementary to a first sequence type and a first complexing agent capable of attaching to a second complexing agent and a second hybridization probe which includes a nucleic acid sequence that selectively hybridizes to the second nucleic acid sequence type over the first sequence type and includes a detectable marker for detecting the second hybridization probe.

  11. Method for identifying and quantifying nucleic acid sequence aberrations

    DOEpatents

    Lucas, J.N.; Straume, T.; Bogen, K.T.

    1998-07-21

    A method is disclosed for detecting nucleic acid sequence aberrations by detecting nucleic acid sequences having both a first and a second nucleic acid sequence type, the presence of the first and second sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. The method uses a first hybridization probe which includes a nucleic acid sequence that is complementary to a first sequence type and a first complexing agent capable of attaching to a second complexing agent and a second hybridization probe which includes a nucleic acid sequence that selectively hybridizes to the second nucleic acid sequence type over the first sequence type and includes a detectable marker for detecting the second hybridization probe. 11 figs.

  12. Structure of the gene coding for the sequence-specific DNA-methyltransferase of the B. subtilis phage SPR.

    PubMed Central

    Pósfai, G; Baldauf, F; Erdei, S; Pósfai, J; Venetianer, P; Kiss, A

    1984-01-01

    The nucleotide sequence of the gene coding for the 5'-GGCC and 5'-CCGG specific DNA methyltransferase of the Bacillus subtilis phage SPR was determined by the Maxam-Gilbert procedure. Transcriptional and translational signals of the sequence were assigned with the help of S1 mapping and translation in E. coli minicells. The gene codes for a 49 kd polypeptide. The amino acid sequence of the SPR methylase shows regions of homology with the sequence of the 5'-GGCC-specific BspRI modification methylase. Images PMID:6096817

  13. Multifractal detrended cross-correlation analysis of coding and non-coding DNA sequences through chaos-game representation

    NASA Astrophysics Data System (ADS)

    Pal, Mayukha; Satish, B.; Srinivas, K.; Rao, P. Madhusudana; Manimaran, P.

    2015-10-01

    We propose a new approach combining the chaos game representation and the two dimensional multifractal detrended cross correlation analysis methods to examine multifractal behavior in power law cross correlation between any pair of nucleotide sequences of unequal lengths. In this work, we analyzed the characteristic behavior of coding and non-coding DNA sequences of eight prokaryotes. The results show the presence of strong multifractal nature between coding and non-coding sequences of all data sets. We found that this integrative approach helps us to consider complete DNA sequences for characterization, and further it may be useful for classification, clustering, identification of class affiliation of nucleotide sequences etc. with high precision.

  14. Key for protein coding sequences identification: computer analysis of codon strategy.

    PubMed Central

    Rodier, F; Gabarro-Arpa, J; Ehrlich, R; Reiss, C

    1982-01-01

    The signal qualifying an AUG or GUG as an initiator in mRNAs processed by E. coli ribosomes is not found to be a systematic, literal homology sequence. In contrast, stability analysis reveals that initiators always occur within nucleic acid domains of low stability, for which a high A/U content is observed. Since no aminoacid selection pressure can be detected at N-termini of the proteins, the A/U enrichment results from a biased usage of the code degeneracy. A computer analysis is presented which allows easy detection of the codon strategy. N-terminal codons carry rather systematically A or U in third position, which suggests a mechanism for translation initiation and helps to detect protein coding sequences in sequenced DNA. PMID:7038623

  15. The most frequent short sequences in non-coding DNA.

    PubMed

    Subirana, Juan A; Messeguer, Xavier

    2010-03-01

    The purpose of this work is to determine the most frequent short sequences in non-coding DNA. They may play a role in maintaining the structure and function of eukaryotic chromosomes. We present a simple method for the detection and analysis of such sequences in several genomes, including Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster and Homo sapiens. We also study two chromosomes of man and mouse with a length similar to the whole genomes of the other species. We provide a list of the most common sequences of 9-14 bases in each genome. As expected, they are present in human Alu sequences. Our programs may also give a graph and a list of their position in the genome. Detection of clusters is also possible. In most cases, these sequences contain few alternating regions. Their intrinsic structure and their influence on nucleosome formation are not known. In particular, we have found new features of short sequences in C. elegans, which are distributed in heterogeneous clusters. They appear as punctuation marks in the chromosomes. Such clusters are not found in either A. thaliana or D. melanogaster. We discuss the possibility that they play a role in centromere function and homolog recognition in meiosis. PMID:19966278

  16. Methods for analyzing nucleic acid sequences

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2011-05-17

    The present invention is directed to a method of sequencing a target nucleic acid. The method provides a complex comprising a polymerase enzyme, a target nucleic acid molecule, and a primer, wherein the complex is immobilized on a support Fluorescent label is attached to a terminal phosphate group of the nucleotide or nucleotide analog. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The time duration of the signal from labeled nucleotides or nucleotide analogs that become incorporated is distinguished from freely diffusing labels by a longer retention in the observation volume for the nucleotides or nucleotide analogs that become incorporated than for the freely diffusing labels.

  17. The Cipher Code of Simple Sequence Repeats in “Vampire Pathogens”

    PubMed Central

    Zou, Geng; Bello-Orti, Bernardo; Aragon, Virginia; Tucker, Alexander W.; Luo, Rui; Ren, Pinxing; Bi, Dingren; Zhou, Rui; Jin, Hui

    2015-01-01

    Blood inside mammals is a forbidden area for the majority of prokaryotic microbes; however, red blood cells tropism microbes, like “vampire pathogens” (VP), succeed in matching scarce nutrients and surviving strong immunity reactions. Here, we found VP of Mycoplasma, Rhizobiales, and Rickettsiales showed significantly higher counts of (AG)n dimeric simple sequence repeats (Di-SSRs) in the genomes, coding and non-coding regions than non Vampire Pathogens (N_VP). Regression analysis indicated a significant correlation between GC content and the span of (AG)n-Di-SSR variation. Gene Ontology (GO) terms with abundance of (AG)3-Di-SSRs shared by the VP strains were associated with purine nucleotide metabolism (FDR < 0.01), indicating an adaptation to the limited availability of purine and nucleotide precursors in blood. Di-amino acids coded by (AG)n-Di-SSRs included all three six-fold code amino acids (Arg, Leu and Ser) and significantly higher counts of Di-amino acids coded by (AG)3, (GA)3, and (TC)3 in VP than N_VP. Furthermore, significant differences (P < 0.001) on the numbers of triplexes formed from (AG)n-Di-SSRs between VP and N_VP in Mycoplasma suggested the potential role of (AG)n-Di-SSRs in gene regulation. PMID:26215592

  18. The influence of protein coding sequences on protein folding rates of all-β proteins.

    PubMed

    Li, Rui Fang; Li, Hong

    2011-06-01

    It is currently believed that the protein folding rate is related to the protein structures and its amino acid sequence. However, few studies have been done on the problem that whether the protein folding rate is influenced by its corresponding mRNA sequence. In this paper, we analyzed the possible relationship between the protein folding rates and the corresponding mRNA sequences. The content of guanine and cytosine (GC content) of palindromes in protein coding sequence was introduced as a new parameter and added in the Gromiha's model of predicting protein folding rates to inspect its effect in protein folding process. The multiple linear regression analysis and jack-knife test show that the new parameter is significant. The linear correlation coefficient between the experimental and the predicted values of the protein folding rates increased significantly from 0.96 to 0.99, and the population variance decreased from 0.50 to 0.24 compared with Gromiha's results. The results show that the GC content of palindromes in the corresponding protein coding sequence really influences the protein folding rate. Further analysis indicates that this kind of effect mostly comes from the synonymous codon usage and from the information of palindrome structure itself, but not from the translation information from codons to amino acids. PMID:21613670

  19. Licensee Event Report sequence coding and search procedure workshop

    SciTech Connect

    Cottrell, W.B.; Gallaher, R.B.

    1981-03-01

    Since mid-1980, the Office for Analysis and Evaluation of Operational Data (AEOD) of the Nuclear Regulatory Commission (NRC) has been developing procedures for the systematic review and analysis of Licensee Event Reports (LERs). These procedures generally address several areas of concern, including identification of significant trends and patterns, event sequence of occurrences, component failures, and system and plant effects. The AEOD and NSIC conducted a workshop on the new coding procedure at the American Museum of Science and Energy in Oak Ridge, TN, on November 24, 1980.

  20. Unnatural reactive amino acid genetic code additions

    SciTech Connect

    Deiters, Alexander; Cropp, T. Ashton; Chin, Jason W.; Anderson, J. Christopher; Schultz, Peter G.

    2011-08-09

    This invention provides compositions and methods for producing translational components that expand the number of genetically encoded amino acids in eukaryotic cells. The components include orthogonal tRNAs, orthogonal aminoacyl-tRNAsyn-thetases, pairs of tRNAs/synthetases and unnatural amino acids. Proteins and methods of producing proteins with unnatural amino acids in eukaryotic cells are also provided.

  1. Unnatural reactive amino acid genetic code additions

    SciTech Connect

    Deiters, Alexander; Cropp, Ashton T; Chin, Jason W; Anderson, Christopher J; Schultz, Peter G

    2013-05-21

    This invention provides compositions and methods for producing translational components that expand the number of genetically encoded amino acids in eukaryotic cells. The components include orthogonal tRNAs, orthogonal aminoacyl-tRNA synthetases, pairs of tRNAs/synthetases and unnatural amino acids. Proteins and methods of producing proteins with unnatural amino acids in eukaryotic cells are also provided.

  2. Unnatural reactive amino acid genetic code additions

    SciTech Connect

    Deiters, Alexander; Cropp, T. Ashton; Chin, Jason W.; Anderson, J. Christopher; Schultz, Peter G.

    2014-08-26

    This invention provides compositions and methods for producing translational components that expand the number of genetically encoded amino acids in eukaryotic cells. The components include orthogonal tRNAs, orthogonal aminoacyl-tRNA synthetases, orthogonal pairs of tRNAs/synthetases and unnatural amino acids. Proteins and methods of producing proteins with unnatural amino acids in eukaryotic cells are also provided.

  3. Unnatural reactive amino acid genetic code additions

    SciTech Connect

    Deiters, Alexander; Cropp, T. Ashton; Chin, Jason W.; Anderson, J. Christopher; Schultz, Peter G.

    2011-02-15

    This invention provides compositions and methods for producing translational components that expand the number of genetically encoded amino acids in eukaryotic cells. The components include orthogonal tRNAs, orthogonal aminoacyl-tRNA synthetases, orthogonal pairs of tRNAs/synthetases and unnatural amino acids. Proteins and methods of producing proteins with unnatural amino acids in eukaryotic cells are also provided.

  4. Code-Time Diversity for Direct Sequence Spread Spectrum Systems

    PubMed Central

    Hassan, A. Y.

    2014-01-01

    Time diversity is achieved in direct sequence spread spectrum by receiving different faded delayed copies of the transmitted symbols from different uncorrelated channel paths when the transmission signal bandwidth is greater than the coherence bandwidth of the channel. In this paper, a new time diversity scheme is proposed for spread spectrum systems. It is called code-time diversity. In this new scheme, N spreading codes are used to transmit one data symbol over N successive symbols interval. The diversity order in the proposed scheme equals to the number of the used spreading codes N multiplied by the number of the uncorrelated paths of the channel L. The paper represents the transmitted signal model. Two demodulators structures will be proposed based on the received signal models from Rayleigh flat and frequency selective fading channels. Probability of error in the proposed diversity scheme is also calculated for the same two fading channels. Finally, simulation results are represented and compared with that of maximal ration combiner (MRC) and multiple-input and multiple-output (MIMO) systems. PMID:24982925

  5. Serotype-specific glycoprotein of simian 11 rotavirus: coding assignment and gene sequence.

    PubMed Central

    Both, G W; Mattick, J S; Bellamy, A R

    1983-01-01

    Cloned DNA copies of the double-stranded RNA genomic segments of simian 11 rotavirus have been used to determine the coding assignment for VP7, the type-specific antigen of this virus. Translation of hybrid-selected mRNAs in an in vitro system supplemented with canine pancreatic microsomes permitted VP7 to be assigned to segment 9 and the two nonstructural viral proteins NCVP4 and NCVP3, to segments 7 and 8, respectively. Hybridization of cloned DNA probes for segments 7-9 with the corresponding segments of human rotavirus Wa confirmed these assignments. The complete nucleotide sequence of gene 9 has been determined. The deduced amino acid sequence reveals VP7 to be 326 amino acids in length with two NH2-terminal hydrophobic regions and a single glycosylation site at residues 69-71. Images PMID:6304692

  6. An integrative approach to predicting the functional effects of non-coding and coding sequence variation

    PubMed Central

    Shihab, Hashem A.; Rogers, Mark F.; Gough, Julian; Mort, Matthew; Cooper, David N.; Day, Ian N. M.; Gaunt, Tom R.; Campbell, Colin

    2015-01-01

    Motivation: Technological advances have enabled the identification of an increasingly large spectrum of single nucleotide variants within the human genome, many of which may be associated with monogenic disease or complex traits. Here, we propose an integrative approach, named FATHMM-MKL, to predict the functional consequences of both coding and non-coding sequence variants. Our method utilizes various genomic annotations, which have recently become available, and learns to weight the significance of each component annotation source. Results: We show that our method outperforms current state-of-the-art algorithms, CADD and GWAVA, when predicting the functional consequences of non-coding variants. In addition, FATHMM-MKL is comparable to the best of these algorithms when predicting the impact of coding variants. The method includes a confidence measure to rank order predictions. Availability and implementation: The FATHMM-MKL webserver is available at: http://fathmm.biocompute.org.uk Contact: H.Shihab@bristol.ac.uk or Mark.Rogers@bristol.ac.uk or C.Campbell@bristol.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25583119

  7. Detection of nucleic acid sequences by invader-directed cleavage

    DOEpatents

    Brow, Mary Ann D.; Hall, Jeff Steven Grotelueschen; Lyamichev, Victor; Olive, David Michael; Prudent, James Robert

    1999-01-01

    The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The 5' nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof. The present invention further relates to methods and devices for the separation of nucleic acid molecules based by charge.

  8. Molecular cloning and amino acid sequence of human 5-lipoxygenase

    SciTech Connect

    Matsumoto, T.; Funk, C.D.; Radmark, O.; Hoeoeg, J.O.; Joernvall, H.; Samuelsson, B.

    1988-01-01

    5-Lipoxygenase (EC 1.13.11.34), a Ca/sup 2 +/- and ATP-requiring enzyme, catalyzes the first two steps in the biosynthesis of the peptidoleukotrienes and the chemotactic factor leukotriene B/sub 4/. A cDNA clone corresponding to 5-lipoxygenase was isolated from a human lung lambda gt11 expression library by immunoscreening with a polyclonal antibody. Additional clones from a human placenta lambda gt11 cDNA library were obtained by plaque hybridization with the /sup 32/P-labeled lung cDNA clone. Sequence data obtained from several overlapping clones indicate that the composite DNAs contain the complete coding region for the enzyme. From the deduced primary structure, 5-lipoxygenase encodes a 673 amino acid protein with a calculated molecular weight of 77,839. Direct analysis of the native protein and its proteolytic fragments confirmed the deduced composition, the amino-terminal amino acid sequence, and the structure of many internal segments. 5-Lipoxygenase has no apparent sequence homology with leukotriene A/sub 4/ hydrolase or Ca/sup 2 +/-binding proteins. RNA blot analysis indicated substantial amounts of an mRNA species of approx. = 2700 nucleotides in leukocytes, lung, and placenta.

  9. Nanopore Sequencing: Electrical Measurements of the Code of Life

    PubMed Central

    Timp, Winston; Mirsaidov, Utkur M.; Wang, Deqiang; Comer, Jeff; Aksimentiev, Aleksei; Timp, Gregory

    2011-01-01

    Sequencing a single molecule of deoxyribonucleic acid (DNA) using a nanopore is a revolutionary concept because it combines the potential for long read lengths (>5 kbp) with high speed (1 bp/10 ns), while obviating the need for costly amplification procedures due to the exquisite single molecule sensitivity. The prospects for implementing this concept seem bright. The cost savings from the removal of required reagents, coupled with the speed of nanopore sequencing places the $1000 genome within grasp. However, challenges remain: high fidelity reads demand stringent control over both the molecular configuration in the pore and the translocation kinetics. The molecular configuration determines how the ions passing through the pore come into contact with the nucleotides, while the translocation kinetics affect the time interval in which the same nucleotides are held in the constriction as the data is acquired. Proteins like α-hemolysin and its mutants offer exquisitely precise self-assembled nanopores and have demonstrated the facility for discriminating individual nucleotides, but it is currently difficult to design protein structure ab initio, which frustrates tailoring a pore for sequencing genomic DNA. Nanopores in solid-state membranes have been proposed as an alternative because of the flexibility in fabrication and ease of integration into a sequencing platform. Preliminary results have shown that with careful control of the dimensions of the pore and the shape of the electric field, control of DNA translocation through the pore is possible. Furthermore, discrimination between different base pairs of DNA may be feasible. Thus, a nanopore promises inexpensive, reliable, high-throughput sequencing, which could thrust genomic science into personal medicine. PMID:21572978

  10. A molecular code dictates sequence-specific DNA recognition by homeodomains.

    PubMed Central

    Damante, G; Pellizzari, L; Esposito, G; Fogolari, F; Viglino, P; Fabbro, D; Tell, G; Formisano, S; Di Lauro, R

    1996-01-01

    Most homeodomains bind to DNA sequences containing the motif 5'-TAAT-3'. The homeodomain of thyroid transcription factor 1 (TTF-1HD) binds to sequences containing a 5'-CAAG-3' core motif, delineating a new mechanism for differential DNA recognition by homeodomains. We investigated the molecular basis of the DNA binding specificity of TTF-1HD by both structural and functional approaches. As already suggested by the three-dimensional structure of TTF-1HD, the DNA binding specificities of the TTF-1, Antennapedia and Engrailed homeodomains, either wild-type or mutants, indicated that the amino acid residue in position 54 is involved in the recognition of the nucleotide at the 3' end of the core motif 5'-NAAN-3'. The nucleotide at the 5' position of this core sequence is recognized by the amino acids located in position 6, 7 and 8 of the TTF-1 and Antennapedia homeodomains. These data, together with previous suggestions on the role of amino acids in position 50, indicate that the DNA binding specificity of homeodomains can be determined by a combinatorial molecular code. We also show that some specific combinations of the key amino acid residues involved in DNA recognition do not follow a simple, additive rule. Images PMID:8890172

  11. Properties of Sequence Conservation in Upstream Regulatory and Protein Coding Sequences among Paralogs in Arabidopsis thaliana

    NASA Astrophysics Data System (ADS)

    Richardson, Dale N.; Wiehe, Thomas

    Whole genome duplication (WGD) has catalyzed the formation of new species, genes with novel functions, altered expression patterns, complexified signaling pathways and has provided organisms a level of genetic robustness. We studied the long-term evolution and interrelationships of 5’ upstream regulatory sequences (URSs), protein coding sequences (CDSs) and expression correlations (EC) of duplicated gene pairs in Arabidopsis. Three distinct methods revealed significant evolutionary conservation between paralogous URSs and were highly correlated with microarray-based expression correlation of the respective gene pairs. Positional information on exact matches between sequences unveiled the contribution of micro-chromosomal rearrangements on expression divergence. A three-way rank analysis of URS similarity, CDS divergence and EC uncovered specific gene functional biases. Transcription factor activity was associated with gene pairs exhibiting conserved URSs and divergent CDSs, whereas a broad array of metabolic enzymes was found to be associated with gene pairs showing diverged URSs but conserved CDSs.

  12. Molecular cloning and sequencing of mRNAs coding for minor adult globin polypeptides of Xenopus laevis.

    PubMed Central

    Knöchel, W; Meyerhof, W; Hummel, S; Grundmann, U

    1983-01-01

    Globin mRNA was isolated from immature red blood cells of an adult Xenopus laevis female. mRNA/cDNA hybrids were integrated in the Pst I cleavage site of pBR 322 by G/C tailing, and cloned in Escherichia coli strain HB 101. By restriction site analysis as well as hybridization behaviour we identified two clones coding for minor adult alpha and beta globin chains. Nucleotide sequence analysis and derived amino acid sequences are presented. PMID:6298748

  13. Amino acid sequence of Japanese quail (Coturnix japonica) and northern bobwhite (Colinus virginianus) myoglobin.

    PubMed

    Goodson, John; Beckstead, Robert B; Payne, Jason; Singh, Rakesh K; Mohan, Anand

    2015-08-15

    Myoglobin has an important physiological role in vertebrates, and as the primary sarcoplasmic pigment in meat, influences quality perception and consumer acceptability. In this study, the amino acid sequences of Japanese quail and northern bobwhite myoglobin were deduced by cDNA cloning of the coding sequence from mRNA. Japanese quail myoglobin was isolated from quail cardiac muscles, purified using ammonium sulphate precipitation and gel-filtration, and subjected to multiple enzymatic digestions. Mass spectrometry corroborated the deduced protein amino acid sequence at the protein level. Sequence analysis revealed both species' myoglobin structures consist of 153 amino acids, differing at only three positions. When compared with chicken myoglobin, Japanese quail showed 98% sequence identity, and northern bobwhite 97% sequence identity. The myoglobin in both quail species contained eight histidine residues instead of the nine present in chicken and turkey. PMID:25794748

  14. Hybridization and sequencing of nucleic acids using base pair mismatches

    DOEpatents

    Fodor, Stephen P. A.; Lipshutz, Robert J.; Huang, Xiaohua

    2001-01-01

    Devices and techniques for hybridization of nucleic acids and for determining the sequence of nucleic acids. Arrays of nucleic acids are formed by techniques, preferably high resolution, light-directed techniques. Positions of hybridization of a target nucleic acid are determined by, e.g., epifluorescence microscopy. Devices and techniques are proposed to determine the sequence of a target nucleic acid more efficiently and more quickly through such synthesis and detection techniques.

  15. A search for conserved sequences in coding regions reveals that the let-7 microRNA targets Dicer within its coding sequence

    PubMed Central

    Forman, Joshua J.; Legesse-Miller, Aster; Coller, Hilary A.

    2008-01-01

    Recognition sites for microRNAs (miRNAs) have been reported to be located in the 3′ untranslated regions of transcripts. In a computational screen for highly conserved motifs within coding regions, we found an excess of sequences conserved at the nucleotide level within coding regions in the human genome, the highest scoring of which are enriched for miRNA target sequences. To validate our results, we experimentally demonstrated that the let-7 miRNA directly targets the miRNA-processing enzyme Dicer within its coding sequence, thus establishing a mechanism for a miRNA/Dicer autoregulatory negative feedback loop. We also found computational evidence to suggest that miRNA target sites in coding regions and 3′ UTRs may differ in mechanism. This work demonstrates that miRNAs can directly target transcripts within their coding region in animals, and it suggests that a complete search for the regulatory targets of miRNAs should be expanded to include genes with recognition sites within their coding regions. As more genomes are sequenced, the methodological approach that we used for identifying motifs with high sequence conservation will be increasingly valuable for detecting functional sequence motifs within coding regions. PMID:18812516

  16. A minimal sequence code for switching protein structure and function.

    PubMed

    Alexander, Patrick A; He, Yanan; Chen, Yihong; Orban, John; Bryan, Philip N

    2009-12-15

    We present here a structural and mechanistic description of how a protein changes its fold and function, mutation by mutation. Our approach was to create 2 proteins that (i) are stably folded into 2 different folds, (ii) have 2 different functions, and (iii) are very similar in sequence. In this simplified sequence space we explore the mutational path from one fold to another. We show that an IgG-binding, 4beta+alpha fold can be transformed into an albumin-binding, 3-alpha fold via a mutational pathway in which neither function nor native structure is completely lost. The stabilities of all mutants along the pathway are evaluated, key high-resolution structures are determined by NMR, and an explanation of the switching mechanism is provided. We show that the conformational switch from 4beta+alpha to 3-alpha structure can occur via a single amino acid substitution. On one side of the switch point, the 4beta+alpha fold is >90% populated (pH 7.2, 20 degrees C). A single mutation switches the conformation to the 3-alpha fold, which is >90% populated (pH 7.2, 20 degrees C). We further show that a bifunctional protein exists at the switch point with affinity for both IgG and albumin. PMID:19923431

  17. 77 FR 65537 - Requirements for Patent Applications Containing Nucleotide Sequence and/or Amino Acid Sequence...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-10-29

    ... Amino Acid Sequence Disclosures ACTION: Proposed collection; comment request. SUMMARY: The United States....'' SUPPLEMENTARY INFORMATION: I. Abstract Patent applications that contain nucleotide and/or amino acid sequence disclosures must include a copy of the sequence listing in accordance with the requirements in 37 CFR...

  18. Cloning and nucleotide sequence of the gene coding for citrate synthase from a thermotolerant Bacillus sp

    SciTech Connect

    Schendel, F.J.; August, P.R.; Anderson, C.R.; Flickinger, M.C. ); Hanson, R.S. )

    1992-01-01

    Acetate salts are emerging as potentially attractive bulk chemicals for a variety of environmental applications, for example, as catalysts to facilitate combustion of high-sulfur coal by electrical utilities and as the biodegradable noncorrosive highway deicing salt calcium magnesium acetate. The structural gene coding for citrate synthase from the gram-positive soil isolate Bacillus sp. strain C4 (ATCC 55182) capable of secreting acetic acid at pH 5.0 to 7.0 in the presence of dolime has been cloned from a genomic library by complementation of an Escherichia coli auxotrophic mutant lacking citrate synthase. The nucleotide sequence of the entire 3.1-kb HindIII fragment has been determined, and one major open reading frame was found coding for citrate synthase (ctsA). Citrate synthase from Bacillus sp. strain C4 was found to be a dimer (M{sub r}, 84,500) with a sub unit with an M{sub r} of 42,000. The N-terminal sequence was found to be identical with that predicted from the gene sequence. The kinetics were best fit to a bisubstrate enzyme with an ordered mechanism. Bacillus sp. strain C4 citrate synthase was not activated by potassium chloride and was not inhibited by NADH, ATP, ADP, or AMP at levels up to 1 mM. The predicted amino acid sequence was compared with that of the E. coli, Acinetobacter anitratum, Pseudomonas aeruginosa, Rickettsia prowazekii, porcine heart, and Saccharomyces cerevisiae cytoplasmic and mitochondrial enzymes.

  19. Mutations analysis of C1 inhibitor coding sequence gene among Portuguese patients with hereditary angioedema.

    PubMed

    Martinho, A; Mendes, J; Simões, O; Nunes, R; Gomes, J; Dias Castro, E; Leiria-Pinto, P; Ferreira, M B; Pereira, C; Castel-Branco, M G; Pais, L

    2013-04-01

    Mutations that modify the amino acid sequence of C1-INH (except Val458Met) are associated with HAE. More than 200 different mutations scattering the entire C1-INH gene have been reported. The main objective of this study was to report the mutational findings in a HAE cohort of 138 Portuguese patients followed in specialized consultation all over the country. DNA was extracted from peripheral blood with QiaSymphony BioRobot (QIAGEN Portugal). The sequence reactions were performed by using a DNA sequencing kit (Big Dye terminator cycle sequencing v1.1/v3.1 from Applied Biosystems) and sequencing products were immediately submitted to direct sequencing on an Applied Biosystem 3130 DNA Analyser. DNA sequences were analyzed at four different stages. Raw data and sequence alignments of all 8 exons and intron-exon boundaries were performed for each patient individually with SeqScape software and using SERPING1 gene NG_009625 of 24,300 bp (12-March-2011) as reference sequence. Sequence comparisons among patients and controls were performed with software CodonCode Aligner v.3.7 from CodonCode Corp and with Geneious 4.5 from Biomatters Lda. A total of 94 point mutations were observed among patients, and 67% of them were located on exon 8. In addition, we noticed one not described stop codon at position c.1459 C>T in three different patients. Translation termination was also found on exon 3 and 7, as a result of mutations at positions c.481A>7, c.1174C>T. In this population, the prevalence of the missense mutation p.Arg444Cys was 39 out of 42. Mutational analysis revealed 22 different pathogenic mutations, of which 64% were not described on HAE database. Although identification of disease causing mutations is not necessary to establish HAE diagnosis, studies on gene expression and characterization of rearrangements in SERPING1 gene are suggested in order to get new insights on function and genetic tests of C1 inhibitor. PMID:23123409

  20. Biosynthesis of riboflavin: cloning, sequencing, mapping, and expression of the gene coding for GTP cyclohydrolase II in Escherichia coli.

    PubMed Central

    Richter, G; Ritz, H; Katzenmeier, G; Volk, R; Kohnle, A; Lottspeich, F; Allendorf, D; Bacher, A

    1993-01-01

    GTP cyclohydrolase II catalyzes the first committed step in the biosynthesis of riboflavin. The gene coding for this enzyme in Escherichia coli has been cloned by marker rescue. Sequencing indicated an open reading frame of 588 bp coding for a 21.8-kDa peptide of 196 amino acids. The gene was mapped to a position at 28.2 min on the E. coli chromosome and is identical with ribA. GTP cyclohydrolase II was overexpressed in a recombinant strain carrying a plasmid with the cloned gene. The enzyme was purified to homogeneity from the recombinant strain. The N-terminal sequence determined by Edman degradation was identical to the predicted sequence. The sequence is homologous to the 3' part of the central open reading frame in the riboflavin operon of Bacillus subtilis. PMID:8320220

  1. Complete amino acid sequence of the Mu heavy chain of a human IgM immunoglobulin.

    PubMed

    Putnam, F W; Florent, G; Paul, C; Shinoda, T; Shimizu, A

    1973-10-19

    The amino acid sequence of the micro, chain of a human IgM immunoglobulin, including the location of all disulfide bridges and oligosaccharides, has been determined. The homology of the constant regions of immunoglobulin micro, gamma, alpha, and epsilon heavy chains reveals evolutionary relationships and suggests that two genes code for each heavy chain. PMID:4742735

  2. Matrix genes of measles virus and canine distemper virus: cloning, nucleotide sequences, and deduced amino acid sequences.

    PubMed Central

    Bellini, W J; Englund, G; Richardson, C D; Rozenblatt, S; Lazzarini, R A

    1986-01-01

    The nucleotide sequences encoding the matrix (M) proteins of measles virus (MV) and canine distemper virus (CDV) were determined from cDNA clones containing these genes in their entirety. In both cases, single open reading frames specifying basic proteins of 335 amino acid residues were predicted from the nucleotide sequences. Both viral messages were composed of approximately 1,450 nucleotides and contained 400 nucleotides of presumptive noncoding sequences at their respective 3' ends. MV and CDV M-protein-coding regions were 67% homologous at the nucleotide level and 76% homologous at the amino acid level. Only chance homology was observed in the 400-nucleotide trailer sequences. Comparisons of the M protein sequences of MV and CDV with the sequence reported for Sendai virus (B. M. Blumberg, K. Rose, M. G. Simona, L. Roux, C. Giorgi, and D. Kolakofsky, J. Virol. 52:656-663; Y. Hidaka, T. Kanda, K. Iwasaki, A. Nomoto, T. Shioda, and H. Shibuta, Nucleic Acids Res. 12:7965-7973) indicated the greatest homology among these M proteins in the carboxyterminal third of the molecule. Secondary-structure analyses of this shared region indicated a structurally conserved, hydrophobic sequence which possibly interacted with the lipid bilayer. Images PMID:3754588

  3. Genetic code correlations - Amino acids and their anticodon nucleotides

    NASA Technical Reports Server (NTRS)

    Weber, A. L.; Lacey, J. C., Jr.

    1978-01-01

    The data here show direct correlations between both the hydrophobicity and the hydrophilicity of the homocodonic amino acids and their anticodon nucleotides. While the differences between properties of uracil and cytosine derivatives are small, further data show that uracil has an affinity for charged species. Although these data suggest that molecular relationships between amino acids and anticodons were responsible for the origin of the code, it is not clear what the mechanism of the origin might have been.

  4. Conversion of amino-acid sequence in proteins to classical music: search for auditory patterns

    PubMed Central

    2007-01-01

    We have converted genome-encoded protein sequences into musical notes to reveal auditory patterns without compromising musicality. We derived a reduced range of 13 base notes by pairing similar amino acids and distinguishing them using variations of three-note chords and codon distribution to dictate rhythm. The conversion will help make genomic coding sequences more approachable for the general public, young children, and vision-impaired scientists. PMID:17477882

  5. Predicting intrinsic disorder from amino acid sequence.

    PubMed

    Obradovic, Zoran; Peng, Kang; Vucetic, Slobodan; Radivojac, Predrag; Brown, Celeste J; Dunker, A Keith

    2003-01-01

    Blind predictions of intrinsic order and disorder were made on 42 proteins subsequently revealed to contain 9,044 ordered residues, 284 disordered residues in 26 segments of length 30 residues or less, and 281 disordered residues in 2 disordered segments of length greater than 30 residues. The accuracies of the six predictors used in this experiment ranged from 77% to 91% for the ordered regions and from 56% to 78% for the disordered segments. The average of the order and disorder predictions ranged from 73% to 77%. The prediction of disorder in the shorter segments was poor, from 25% to 66% correct, while the prediction of disorder in the longer segments was better, from 75% to 95% correct. Four of the predictors were composed of ensembles of neural networks. This enabled them to deal more efficiently with the large asymmetry in the training data through diversified sampling from the significantly larger ordered set and achieve better accuracy on ordered and long disordered regions. The exclusive use of long disordered regions for predictor training likely contributed to the disparity of the predictions on long versus short disordered regions, while averaging the output values over 61-residue windows to eliminate short predictions of order or disorder probably contributed to the even greater disparity for three of the predictors. This experiment supports the predictability of intrinsic disorder from amino acid sequence. PMID:14579347

  6. Rat hepatic glutaminase: identification of the full coding sequence and characterization of a functional promoter.

    PubMed Central

    Chung-Bok, M I; Vincent, N; Jhala, U; Watford, M

    1997-01-01

    Glutamine catabolism in mammalian liver is catalysed by a unique isoenzyme of phosphate-activated glutaminase. The full coding and 5' untranslated sequence for rat hepatic glutaminase was isolated by screening lambda ZAP cDNA libraries and a Charon 4a rat genomic library. The sequence produces a mRNA 2225 nt in length, encoding a polypeptide of 535 amino acid residues with a calculated molecular mass of 59.2 kDa. The deduced amino acid sequence of rat liver glutaminase shows 86% similarity to that of rat kidney glutaminase and 65% similarity to a putative glutaminase from Caenorhabditis elegans. A genomic clone to rat liver glutaminase was isolated that contains 3.5 kb of the gene and 7.5 kb of the 5' flanking region. The 1 kb immediately upstream of the hepatic glutaminase gene (from -1022 to +48) showed functional promoter activity in HepG2 hepatoma cells. This promoter region did not respond to treatment with cAMP, but was highly responsive (10-fold stimulation) to the synthetic glucocorticoid dexamethasone. Subsequent 5' deletion analysis indicated that the promoter region between -103 and +48 was sufficient for basal promoter activity. This region does not contain an identifiable TATA element, indicating that transcription of the glutaminase gene is driven by a TATA-less promoter. The region responsive to glucocorticoids was mapped to -252 to -103 relative to the transcription start site. PMID:9164856

  7. Coding in 2D: Using Intentional Dispersity to Enhance the Information Capacity of Sequence-Coded Polymer Barcodes.

    PubMed

    Laure, Chloé; Karamessini, Denise; Milenkovic, Olgica; Charles, Laurence; Lutz, Jean-François

    2016-08-26

    A 2D approach was studied for the design of polymer-based molecular barcodes. Uniform oligo(alkoxyamine amide)s, containing a monomer-coded binary message, were synthesized by orthogonal solid-phase chemistry. Sets of oligomers with different chain-lengths were prepared. The physical mixture of these uniform oligomers leads to an intentional dispersity (1st dimension fingerprint), which is measured by electrospray mass spectrometry. Furthermore, the monomer sequence of each component of the mass distribution can be analyzed by tandem mass spectrometry (2nd dimension sequencing). By summing the sequence information of all components, a binary message can be read. A 4-bytes extended ASCII-coded message was written on a set of six uniform oligomers. Alternatively, a 3-bytes sequence was written on a set of five oligomers. In both cases, the coded binary information was recovered. PMID:27484303

  8. Cloning and nucleotide sequence of the genes coding for the Sau96I restriction and modification enzymes.

    PubMed Central

    Szilák, L; Venetianer, P; Kiss, A

    1990-01-01

    The genes coding for the GGNCC specific Sau96I restriction and modification enzymes were cloned and expressed in E. coli. The DNA sequence predicts a 430 amino acid protein (Mr: 49,252) for the methyltransferase and a 261 amino acid protein (Mr: 30,486) for the endonuclease. No protein sequence similarity was detected between the Sau96I methyltransferase and endonuclease. The methyltransferase contains the sequence elements characteristic for m5C-methyltransferases. In addition to this, M.Sau96I shows similarity, also in the variable region, with one m5C-methyltransferase (M.SinI) which has closely related recognition specificity (GGA/TCC). M.Sau96I methylates the internal cytosine within the GGNCC recognition sequence. The Sau96I endonuclease appears to act as a monomer. Images PMID:2204026

  9. Methods and compositions for efficient nucleic acid sequencing

    DOEpatents

    Drmanac, Radoje

    2002-01-01

    Disclosed are novel methods and compositions for rapid and highly efficient nucleic acid sequencing based upon hybridization with two sets of small oligonucleotide probes of known sequences. Extremely large nucleic acid molecules, including chromosomes and non-amplified RNA, may be sequenced without prior cloning or subcloning steps. The methods of the invention also solve various current problems associated with sequencing technology such as, for example, high noise to signal ratios and difficult discrimination, attaching many nucleic acid fragments to a surface, preparing many, longer or more complex probes and labelling more species.

  10. Methods and compositions for efficient nucleic acid sequencing

    DOEpatents

    Drmanac, Radoje

    2006-07-04

    Disclosed are novel methods and compositions for rapid and highly efficient nucleic acid sequencing based upon hybridization with two sets of small oligonucleotide probes of known sequences. Extremely large nucleic acid molecules, including chromosomes and non-amplified RNA, may be sequenced without prior cloning or subcloning steps. The methods of the invention also solve various current problems associated with sequencing technology such as, for example, high noise to signal ratios and difficult discrimination, attaching many nucleic acid fragments to a surface, preparing many, longer or more complex probes and labelling more species.

  11. Kit for detecting nucleic acid sequences using competitive hybridization probes

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    2001-01-01

    A kit is provided for detecting a target nucleic acid sequence in a sample, the kit comprising: a first hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a first portion of the target sequence, the first hybridization probe including a first complexing agent for forming a binding pair with a second complexing agent; and a second hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a second portion of the target sequence to which the first hybridization probe does not selectively hybridize, the second hybridization probe including a detectable marker; a third hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a first portion of the target sequence, the third hybridization probe including the same detectable marker as the second hybridization probe; and a fourth hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a second portion of the target sequence to which the third hybridization probe does not selectively hybridize, the fourth hybridization probe including the first complexing agent for forming a binding pair with the second complexing agent; wherein the first and second hybridization probes are capable of simultaneously hybridizing to the target sequence and the third and fourth hybridization probes are capable of simultaneously hybridizing to the target sequence, the detectable marker is not present on the first or fourth hybridization probes and the first, second, third, and fourth hybridization probes each include a competitive nucleic acid sequence which is sufficiently complementary to a third portion of the target sequence that the competitive sequences of the first, second, third, and fourth hybridization probes compete with each other to hybridize to the third portion of the

  12. Functional annotation of non-coding sequence variants

    PubMed Central

    Ritchie, Graham R. S.; Dunham, Ian; Zeggini, Eleftheria; Flicek, Paul

    2016-01-01

    Identifying functionally relevant variants against the background of ubiquitous genetic variation is a major challenge in human genetics. For variants that fall in protein-coding regions our understanding of the genetic code and splicing allow us to identify likely candidates, but interpreting variants that fall outside of genic regions is more difficult. Here we present a new tool, GWAVA, which supports prioritisation of non-coding variants by integrating a range of annotations. PMID:24487584

  13. OrfPredictor: predicting protein-coding regions in EST-derived sequences.

    PubMed

    Min, Xiang Jia; Butler, Gregory; Storms, Reginald; Tsang, Adrian

    2005-07-01

    OrfPredictor is a web server designed for identifying protein-coding regions in expressed sequence tag (EST)-derived sequences. For query sequences with a hit in BLASTX, the program predicts the coding regions based on the translation reading frames identified in BLASTX alignments, otherwise, it predicts the most probable coding region based on the intrinsic signals of the query sequences. The output is the predicted peptide sequences in the FASTA format, and a definition line that includes the query ID, the translation reading frame and the nucleotide positions where the coding region begins and ends. OrfPredictor facilitates the annotation of EST-derived sequences, particularly, for large-scale EST projects. OrfPredictor is available at https://fungalgenome.concordia.ca/tools/OrfPredictor.html. PMID:15980561

  14. Solid phase sequencing of double-stranded nucleic acids

    DOEpatents

    Fu, Dong-Jing; Cantor, Charles R.; Koster, Hubert; Smith, Cassandra L.

    2002-01-01

    This invention relates to methods for detecting and sequencing of target double-stranded nucleic acid sequences, to nucleic acid probes and arrays of probes useful in these methods, and to kits and systems which contain these probes. Useful methods involve hybridizing the nucleic acids or nucleic acids which represent complementary or homologous sequences of the target to an array of nucleic acid probes. These probe comprise a single-stranded portion, an optional double-stranded portion and a variable sequence within the single-stranded portion. The molecular weights of the hybridized nucleic acids of the set can be determined by mass spectroscopy, and the sequence of the target determined from the molecular weights of the fragments. Nucleic acids whose sequences can be determined include nucleic acids in biological samples such as patient biopsies and environmental samples. Probes may be fixed to a solid support such as a hybridization chip to facilitate automated determination of molecular weights and identification of the target sequence.

  15. Analysis and Annotation of Nucleic Acid Sequence

    SciTech Connect

    States, David J.

    2004-07-28

    The aims of this project were to develop improved methods for computational genome annotation and to apply these methods to improve the annotation of genomic sequence data with a specific focus on human genome sequencing. The project resulted in a substantial body of published work. Notable contributions of this project were the identification of basecalling and lane tracking as error processes in genome sequencing and contributions to improved methods for these steps in genome sequencing. This technology improved the accuracy and throughput of genome sequence analysis. Probabilistic methods for physical map construction were developed. Improved methods for sequence alignment, alternative splicing analysis, promoter identification and NF kappa B response gene prediction were also developed.

  16. Analysis and Annotation of Nucleic Acid Sequence

    SciTech Connect

    David J. States

    1998-08-01

    The aims of this project were to develop improved methods for computational genome annotation and to apply these methods to improve the annotation of genomic sequence data with a specific focus on human genome sequencing. The project resulted in a substantial body of published work. Notable contributions of this project were the identification of basecalling and lane tracking as error processes in genome sequencing and contributions to improved methods for these steps in genome sequencing. This technology improved the accuracy and throughput of genome sequence analysis. Probabilistic methods for physical map construction were developed. Improved methods for sequence alignment, alternative splicing analysis, promoter identification and NF kappa B response gene prediction were also developed.

  17. Complete Coding Genome Sequence of Putative Novel Bluetongue Virus Serotype 27

    PubMed Central

    Jenckel, Maria; Bréard, Emmanuel; Schulz, Claudia; Sailleau, Corinne; Viarouge, Cyril; Hoffmann, Bernd; Beer, Martin; Zientara, Stéphan

    2015-01-01

    We announce the complete coding genome sequence of a novel bluetongue virus (BTV) serotype (BTV-n = putative BTV-27) detected in goats in Corsica, France, in 2014. Sequence analysis confirmed the closest relationship between sequences of the novel BTV serotype and BTV-25 and BTV-26, recently discovered in Switzerland and Kuwait, respectively. PMID:25767218

  18. Bean yellow mosaic, clover yellow vein, and pea mosaic are distinct potyviruses: evidence from coat protein gene sequences and molecular hybridization involving the 3' non-coding regions.

    PubMed

    Tracy, S L; Frenkel, M J; Gough, K H; Hanna, P J; Shukla, D D

    1992-01-01

    The sequences of the 3' 1019 nucleotides of the genome of an atypical strain of bean yellow mosaic virus (BYMV-S) and of the 3' 1018 nucleotides of the clover yellow vein virus (CYVV-B) genome have been determined. These sequences contain the complete coding region of the viral coat protein followed by a 3' non-coding region of 173 and 178 nucleotides for BYMV-S and CYVV-B, respectively. When the deduced amino acid sequences of the coat protein coding regions were compared, a sequence identity of 77% was found between the two viruses, and optimal alignment of the 3' untranslated regions of BYMV-S and CYVV-B gave a 65% identity. However, the degree of homology of the amino acid sequences of coat proteins of BYMV-S with the published sequences for three other strains of BYMV ranged from 88% to 94%, while the sequence homology of the 3' untranslated regions between the four strains of BYMV ranged between 86% and 95%. Amplified DNA probes corresponding to the 3' non-coding regions of BYMV-S and CYVV-B showed strong hybridization only with the strains of their respective viruses and not with strains of other potyviruses, including pea mosaic virus (PMV). The relatively low sequence identities between the BYMV-S and CYVV-B coat proteins and their 3' non-coding regions, together with the hybridization results, indicate that BYMV, CYVV, and PMV are distinct potyviruses. PMID:1731696

  19. Draft genome sequence of the docosahexaenoic acid producing thraustochytrid Aurantiochytrium sp. T66.

    PubMed

    Liu, Bin; Ertesvåg, Helga; Aasen, Inga Marie; Vadstein, Olav; Brautaset, Trygve; Heggeset, Tonje Marita Bjerkan

    2016-06-01

    Thraustochytrids are unicellular, marine protists, and there is a growing industrial interest in these organisms, particularly because some species, including strains belonging to the genus Aurantiochytrium, accumulate high levels of docosahexaenoic acid (DHA). Here, we report the draft genome sequence of Aurantiochytrium sp. T66 (ATCC PRA-276), with a size of 43 Mbp, and 11,683 predicted protein-coding sequences. The data has been deposited at DDBJ/EMBL/Genbank under the accession LNGJ00000000. The genome sequence will contribute new insight into DHA biosynthesis and regulation, providing a basis for metabolic engineering of thraustochytrids. PMID:27222814

  20. Expression in bacteria of gB-glycoprotein-coding sequences of Herpes simplex virus type 2.

    PubMed

    Person, S; Warner, S C; Bzik, D J; Debroy, C; Fox, B A

    1985-01-01

    A plasmid with an insert that encodes the glycoprotein B(gB) gene of Herpes simplex virus type 2 (HSV-2) has been isolated. DNA sequences coding for a portion of the HSV-2 gB peptide were cloned into a bacterial lacZ alpha expression vector and used to transform Escherichia coli. Upon induction of lacZpo-promoted transcription, some of the bacteria became filamentous and produced inclusion bodies containing a large amount of a 65-kDal peptide that was shown to be precipitated by broad-spectrum antibodies to HSV-2 and HSV-1. The HSV-2 insert of one of these clones specifies amino acid residues corresponding to 135 through 629 of the gB of HSV-1 [Bzik et al., Virology 133 (1984) 301-314]. PMID:2412940

  1. From Artificial Amino Acids to Sequence-Defined Targeted Oligoaminoamides.

    PubMed

    Morys, Stephan; Wagner, Ernst; Lächelt, Ulrich

    2016-01-01

    Artificial oligoamino acids with appropriate protecting groups can be used for the sequential assembly of oligoaminoamides on solid-phase. With the help of these oligoamino acids multifunctional nucleic acid (NA) carriers can be designed and produced in highly defined topologies. Here we describe the synthesis of the artificial oligoamino acid Fmoc-Stp(Boc3)-OH, the subsequent assembly into sequence-defined oligomers and the formulation of tumor-targeted plasmid DNA (pDNA) polyplexes. PMID:27436323

  2. Sequence analysis of the 3' non-coding region of mouse immunoglobulin light chain messenger RNA.

    PubMed Central

    Hamlyn, P H; Gillam, S; Smith, M; Milstein, C

    1977-01-01

    Using an oligonucleotide d(pT10-C-A) as primer, cDNA has been transcribed from the 3' non-coding region of mouse immunoglobulin light chain mRNA and sequenced by a modification1 of the 'plus-minus' gel method2. The sequence obtained has partially corrected and extended a previously obtained sequence3. The new data contains an unusual sequence in which a trinucleotide is repeated seven times. Images PMID:405661

  3. Sequences encoding identical peptides for the analysis and manipulation of coding DNA

    PubMed Central

    Sánchez, Joaquín

    2013-01-01

    The use of sequences encoding identical peptides (SEIP) for the in silico analysis of coding DNA from different species has not been reported; the study of such sequences could directly reveal properties of coding DNA that are independent of peptide sequences. For practical purposes SEIP might also be manipulated for e.g. heterologous protein expression. We extracted 1,551 SEIP from human and E. coli and 2,631 SEIP from human and D. melanogaster. We then analyzed codon usage and intercodon dinucleotide tendencies and found differences in both, with more conspicuous disparities between human and E. coli than between human and D. melanogaster. We also briefly manipulated SEIP to find out if they could be used to create new coding sequences. We hence attempted replacement of human by E. coli codons via dicodon exchange but found that full replacement was not possible, this indicated robust species-specific dicodon tendencies. To test another form of codon replacement we isolated SEIP from human and the jellyfish green fluorescent protein (GFP) and we then re-constructed the GFP coding DNA with human tetra-peptide-coding sequences. Results provide proof-of-principle that SEIP may be used to reveal differences in the properties of coding DNA and to reconstruct in pieces a protein coding DNA with sequences from a different organism, the latter might be exploited in heterologous protein expression. PMID:23861567

  4. Sequences encoding identical peptides for the analysis and manipulation of coding DNA.

    PubMed

    Sánchez, Joaquín

    2013-01-01

    The use of sequences encoding identical peptides (SEIP) for the in silico analysis of coding DNA from different species has not been reported; the study of such sequences could directly reveal properties of coding DNA that are independent of peptide sequences. For practical purposes SEIP might also be manipulated for e.g. heterologous protein expression. We extracted 1,551 SEIP from human and E. coli and 2,631 SEIP from human and D. melanogaster. We then analyzed codon usage and intercodon dinucleotide tendencies and found differences in both, with more conspicuous disparities between human and E. coli than between human and D. melanogaster. We also briefly manipulated SEIP to find out if they could be used to create new coding sequences. We hence attempted replacement of human by E. coli codons via dicodon exchange but found that full replacement was not possible, this indicated robust species-specific dicodon tendencies. To test another form of codon replacement we isolated SEIP from human and the jellyfish green fluorescent protein (GFP) and we then re-constructed the GFP coding DNA with human tetra-peptide-coding sequences. Results provide proof-of-principle that SEIP may be used to reveal differences in the properties of coding DNA and to reconstruct in pieces a protein coding DNA with sequences from a different organism, the latter might be exploited in heterologous protein expression. PMID:23861567

  5. Conservation of Shannon's redundancy for proteins. [information theory applied to amino acid sequences

    NASA Technical Reports Server (NTRS)

    Gatlin, L. L.

    1974-01-01

    Concepts of information theory are applied to examine various proteins in terms of their redundancy in natural originators such as animals and plants. The Monte Carlo method is used to derive information parameters for random protein sequences. Real protein sequence parameters are compared with the standard parameters of protein sequences having a specific length. The tendency of a chain to contain some amino acids more frequently than others and the tendency of a chain to contain certain amino acid pairs more frequently than other pairs are used as randomness measures of individual protein sequences. Non-periodic proteins are generally found to have random Shannon redundancies except in cases of constraints due to short chain length and genetic codes. Redundant characteristics of highly periodic proteins are discussed. A degree of periodicity parameter is derived.

  6. Multi-criterial coding sequence prediction. Combination of GeneMark with two novel, coding-character specific quantities.

    PubMed

    Almirantis, Yannis; Nikolaou, Christoforos

    2005-10-01

    This work applies two recently formulated quantities, strongly correlated with the coding character of a sequence, as an additional "module" on GeneMark, in a three-criterial method. The difference in the statistical approaches implicated by the methods combined here, is expected to contribute to an efficient assignment of functionality to unannotated genomic sequences. The developed combined algorithm is used to fractionalize a collection of GeneMark-predicted exons into sub-collections of different expectation to be coding. A further modification of the algorithm allows for the assignment of an improved estimation of the probability to be coding, to GeneMark-predicted exons. This is on the basis of a suitable training set of GeneMark-predicted exons of known functionality. PMID:15809100

  7. Complete cDNA and derived amino acid sequence of human factor V

    SciTech Connect

    Jenny, R.J.; Pittman, D.D.; Toole, J.J.; Kriz, R.W.; Aldape, R.A.; Hewick, R.M.; Kaufman, R.J.; Mann, K.G.

    1987-07-01

    cDNA clones encoding human factor V have been isolated from an oligo(dT)-primed human fetal liver cDNA library prepared with vector Charon 21A. The cDNA sequence of factor V from three overlapping clones includes a 6672-base-pair (bp) coding region, a 90-bp 5' untranslated region, and a 163-bp 3' untranslated region within which is a poly(A)tail. The deduced amino acid sequence consists of 2224 amino acids inclusive of a 28-amino acid leader peptide. Direct comparison with human factor VIII reveals considerable homology between proteins in amino acid sequence and domain structure: a triplicated A domain and duplicated C domain show approx. 40% identity with the corresponding domains in factor VIII. As in factor VIII, the A domains of factor V share approx. 40% amino acid-sequence homology with the three highly conserved domains in ceruloplasmin. The B domain of factor V contains 35 tandem and approx. 9 additional semiconserved repeats of nine amino acids of the form Asp-Leu-Ser-Gln-Thr-Thr/Asn-Leu-Ser-Pro and 2 additional semiconserved repeats of 17 amino acids. Factor V contains 37 potential N-linked glycosylation sites, 25 of which are in the B domain, and a total of 19 cysteine residues.

  8. Relation between mRNA expression and sequence information in Desulfovibrio vulgaris: Combinatorial contributions of upstream regulatory motifs and coding sequence features to variations in mRNA abundance

    SciTech Connect

    Wu, Gang; Nie, Lei; Zhang, Weiwen

    2006-05-26

    ABSTRACT-The context-dependent expression of genes is the core for biological activities, and significant attention has been given to identification of various factors contributing to gene expression at genomic scale. However, so far this type of analysis has been focused whether on relation between mRNA expression and non-coding sequence features such as upstream regulatory motifs or on correlation between mRN abundance and non-random features in coding sequences (e.g. codon usage and amino acid usage). In this study multiple regression analyses of the mRNA abundance and all sequence information in Desulfovibrio vulgaris were performed, with the goal to investigate how much coding and non-coding sequence features contribute to the variations in mRNA expression, and in what manner they act together...

  9. Peculiar symmetry of DNA sequences and evidence suggesting its evolutionary origin in a primeval genetic code

    NASA Astrophysics Data System (ADS)

    Jolivet, R.; Rothen, F.

    2001-08-01

    Statistical analysis of the distribution of codons in DNA coding sequences of bacteria or archaea suggests that, at some stage of the prebiotic world, the most successful RNA replicating sequences afforded some tendency toward a weak form of palindromic symmetry, namely complementary symmetry. As a consequence, as soon as the machinery allowing translation into proteins was beginning to settle, we assume that primeval versions of the genetic code essentially consisted of pairs of sense-antisense codons. Present-day DNA sequences display footprints of this early symmetry, provided that statistics are made over coding sequences issued from groups of organisms and not only from the genome of an individual species. These fossil traces are proven to be significant from the statistical point of view. They shed some light onto the possible evolution of the genetic code and set some constraints on the way it had to follow.

  10. Segments of amino acid sequence similarity in beta-amylases.

    PubMed

    Friedberg, F; Rhodes, C

    1988-01-01

    In alpha-amylases from animals, plants and bacteria and in beta-amylases from plants and bacteria a number of segments exhibit amino acid sequence similarity specific to the alpha or to the beta type, respectively. In the case of the beta-amylases the similar sequence regions are extensive and they are disrupted only by short interspersed dissimilar regions. Close to the C terminus, however, no such sequence similarity exist. PMID:2464171

  11. Transcriptome Sequencing in Response to Salicylic Acid in Salvia miltiorrhiza

    PubMed Central

    Zhang, Xiaoru; Dong, Juane; Liu, Hailong; Wang, Jiao; Qi, Yuexin; Liang, Zongsuo

    2016-01-01

    Salvia miltiorrhiza is a traditional Chinese herbal medicine, whose quality and yield are often affected by diseases and environmental stresses during its growing season. Salicylic acid (SA) plays a significant role in plants responding to biotic and abiotic stresses, but the involved regulatory factors and their signaling mechanisms are largely unknown. In order to identify the genes involved in SA signaling, the RNA sequencing (RNA-seq) strategy was employed to evaluate the transcriptional profiles in S. miltiorrhiza cell cultures. A total of 50,778 unigenes were assembled, in which 5,316 unigenes were differentially expressed among 0-, 2-, and 8-h SA induction. The up-regulated genes were mainly involved in stimulus response and multi-organism process. A core set of candidate novel genes coding SA signaling component proteins was identified. Many transcription factors (e.g., WRKY, bHLH and GRAS) and genes involved in hormone signal transduction were differentially expressed in response to SA induction. Detailed analysis revealed that genes associated with defense signaling, such as antioxidant system genes, cytochrome P450s and ATP-binding cassette transporters, were significantly overexpressed, which can be used as genetic tools to investigate disease resistance. Our transcriptome analysis will help understand SA signaling and its mechanism of defense systems in S. miltiorrhiza. PMID:26808150

  12. Transcriptome Sequencing in Response to Salicylic Acid in Salvia miltiorrhiza.

    PubMed

    Zhang, Xiaoru; Dong, Juane; Liu, Hailong; Wang, Jiao; Qi, Yuexin; Liang, Zongsuo

    2016-01-01

    Salvia miltiorrhiza is a traditional Chinese herbal medicine, whose quality and yield are often affected by diseases and environmental stresses during its growing season. Salicylic acid (SA) plays a significant role in plants responding to biotic and abiotic stresses, but the involved regulatory factors and their signaling mechanisms are largely unknown. In order to identify the genes involved in SA signaling, the RNA sequencing (RNA-seq) strategy was employed to evaluate the transcriptional profiles in S. miltiorrhiza cell cultures. A total of 50,778 unigenes were assembled, in which 5,316 unigenes were differentially expressed among 0-, 2-, and 8-h SA induction. The up-regulated genes were mainly involved in stimulus response and multi-organism process. A core set of candidate novel genes coding SA signaling component proteins was identified. Many transcription factors (e.g., WRKY, bHLH and GRAS) and genes involved in hormone signal transduction were differentially expressed in response to SA induction. Detailed analysis revealed that genes associated with defense signaling, such as antioxidant system genes, cytochrome P450s and ATP-binding cassette transporters, were significantly overexpressed, which can be used as genetic tools to investigate disease resistance. Our transcriptome analysis will help understand SA signaling and its mechanism of defense systems in S. miltiorrhiza. PMID:26808150

  13. Stochastic model of homogeneous coding and latent periodicity in DNA sequences.

    PubMed

    Chaley, Maria; Kutyrkin, Vladimir

    2016-02-01

    The concept of latent triplet periodicity in coding DNA sequences which has been earlier extensively discussed is confirmed in the result of analysis of a number of eukaryotic genomes, where latent periodicity of a new type, called profile periodicity, is recognized in the CDSs. Original model of Stochastic Homogeneous Organization of Coding (SHOC-model) in textual string is proposed. This model explains the existence of latent profile periodicity and regularity in DNA sequences. PMID:26656186

  14. A machine learning strategy to identify candidate binding sites in human protein-coding sequence

    PubMed Central

    Down, Thomas; Leong, Bernard; Hubbard, Tim JP

    2006-01-01

    Background The splicing of RNA transcripts is thought to be partly promoted and regulated by sequences embedded within exons. Known sequences include binding sites for SR proteins, which are thought to mediate interactions between splicing factors bound to the 5' and 3' splice sites. It would be useful to identify further candidate sequences, however identifying them computationally is hard since exon sequences are also constrained by their functional role in coding for proteins. Results This strategy identified a collection of motifs including several previously reported splice enhancer elements. Although only trained on coding exons, the model discriminates both coding and non-coding exons from intragenic sequence. Conclusion We have trained a computational model able to detect signals in coding exons which seem to be orthogonal to the sequences' primary function of coding for proteins. We believe that many of the motifs detected here represent binding sites for both previously unrecognized proteins which influence RNA splicing as well as other regulatory elements. PMID:17002805

  15. Coherent direct sequence optical code multiple access encoding-decoding efficiency versus wavelength detuning.

    PubMed

    Pastor, D; Amaya, W; García-Olcina, R; Sales, S

    2007-07-01

    We present a simple theoretical model of and the experimental verification for vanishing of the autocorrelation peak due to wavelength detuning on the coding-decoding process of coherent direct sequence optical code multiple access systems based on a superstructured fiber Bragg grating. Moreover, the detuning vanishing effect has been explored to take advantage of this effect and to provide an additional degree of multiplexing and/or optical code tuning. PMID:17603606

  16. The Coding and Effector Transfer of Movement Sequences

    ERIC Educational Resources Information Center

    Kovacs, Attila J.; Muhlbauer, Thomas; Shea, Charles H.

    2009-01-01

    Three experiments utilizing a 14-element arm movement sequence were designed to determine if reinstating the visual-spatial coordinates, which require movements to the same spatial locations utilized during acquisition, results in better effector transfer than reinstating the motor coordinates, which require the same pattern of homologous muscle…

  17. Cloning and DNA sequence of the gene coding for Clostridium thermocellum cellulase Ss (CelS), a major cellulosome component.

    PubMed Central

    Wang, W K; Kruus, K; Wu, J H

    1993-01-01

    Clostridium thermocellum ATCC 27405 produces an extracellular cellulase system capable of hydrolyzing crystalline cellulose. The enzyme system involves a multicomponent protein aggregate (the cellulosome) with a total molecular weight in the millions, impeding mechanistic studies. However, two major components of the aggregate, SS (M(r) = 82,000) and SL (M(r) = 250,000), which act synergistically to hydrolyze crystalline cellulose, have been identified (J. H. D. Wu, W. H. Orme-Johnson, and A. L. Demain, Biochemistry 27:1703-1709, 1988). To further study this synergism, we cloned and sequenced the gene (celS) coding for the SS (CelS) protein by using a degenerate, inosine-containing oligonucleotide probe whose sequence was derived from the N-terminal amino acid sequence of the CelS protein. The open reading frame of celS consisted of 2,241 bp encoding 741 amino acid residues. It encoded the N-terminal amino acid sequence and two internal peptide sequences determined for the native CelS protein. A putative ribosome binding site was identified at the 5' end of the gene. A putative signal peptide of 27 amino acid residues was adjacent to the N terminus of the CelS protein. The predicted molecular weight of the secreted protein was 80,670. The celS gene contained a conserved reiterated sequence encoding 24 amino acid residues found in proteins encoded by many other clostridial cel or xyn genes. A palindromic structure was found downstream from the open reading frame. The celS gene is unique among the known cel genes of C. thermocellum. However, it is highly homologous to the partial open reading frame found in C. cellulolyticum and in Caldocellum saccharolyticum, indicating that these genes belong to a new family of cel genes. Images PMID:8444792

  18. Do Intron and Coding Sequences of Some Human-Mouse Orthologs Evolve as a Single Unit?

    PubMed

    Fuertes, Miguel Angel; Rodrigo, José Ramón; Alonso, Carlos

    2016-06-01

    It has been previously suggested that both the coding and the associated non-coding sequences of some human-mouse orthologs could evolve as a single unit. This letter deals with the observation that between mouse and humans some orthologs change significantly their compositional features as an indication that the molecular evolution is a local process. Moreover, the data shown indicate that the coding and the intron sequences of these orthologs do not evolve independently but instead both undergo a concerted evolution, evolving as a single unit, from a compositional cluster in mouse to a different compositional cluster in human. PMID:27220874

  19. Cloning and nucleotide sequence of the simian rotavirus gene 6 that codes for the major inner capsid protein.

    PubMed Central

    Estes, M K; Mason, B B; Crawford, S; Cohen, J

    1984-01-01

    The nucleotide sequence of the gene that codes for the major inner capsid protein of the simian rotavirus SA11 has been determined. A DNA copy of mRNA from gene 6 was cloned in the E. coli plasmid pBR322. The full-length gene is 1357 nucleotides long with a 5'-noncoding region of 23 nucleotides and a 3'-noncoding region of 140 nucleotides. The gene contains a single, long, open reading-frame of 1194 nucleotides capable of coding for a protein of 397 amino acids with a molecular weight of 44,816. The predicted protein product is relatively proline-rich with a net charge at neutral pH of -3.5. One stretch of 53 amino acids (encoded by nucleotides 327-485) is basic. Images PMID:6322125

  20. Classification of Arabidopsis thaliana gene sequences: clustering of coding sequences into two groups according to codon usage improves gene prediction.

    PubMed

    Mathé, C; Peresetsky, A; Déhais, P; Van Montagu, M; Rouzé, P

    1999-02-01

    While genomic sequences are accumulating, finding the location of the genes remains a major issue that can be solved only for about a half of them by homology searches. Prediction methods are thus required, but unfortunately are not fully satisfying. Most prediction methods implicitly assume a unique model for genes. This is an oversimplification as demonstrated by the possibility to group coding sequences into several classes in Escherichia coli and other genomes. As no classification existed for Arabidopsis thaliana, we classified genes according to the statistical features of their coding sequences. A clustering algorithm using a codon usage model was developed and applied to coding sequences from A. thaliana, E. coli, and a mixture of both. By using it, Arabidopsis sequences were clustered into two classes. The CU1 and CU2 classes differed essentially by the choice of pyrimidine bases at the codon silent sites: CU2 genes often use C whereas CU1 genes prefer T. This classification discriminated the Arabidopsis genes according to their expressiveness, highly expressed genes being clustered in CU2 and genes expected to have a lower expression, such as the regulatory genes, in CU1. The algorithm separated the sequences of the Escherichia-Arabidopsis mixed data set into five classes according to the species, except for one class. This mixed class contained 89 % Arabidopsis genes from CU1 and 11 % E. coli genes, mostly horizontally transferred. Interestingly, most genes encoding organelle-targeted proteins, except the photosynthetic and photoassimilatory ones, were clustered in CU1. By tailoring the GeneMark CDS prediction algorithm to the observed coding sequence classes, its quality of prediction was greatly improved. Similar improvement can be expected with other prediction systems. PMID:9925779

  1. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... acids are not intended to be embraced by this definition. Any amino acid sequence that contains post-translationally modified amino acids may be described as the amino acid sequence that is initially translated... sequence of four or more amino acids or an unbranched sequence of ten or more nucleotides....

  2. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... acids are not intended to be embraced by this definition. Any amino acid sequence that contains post-translationally modified amino acids may be described as the amino acid sequence that is initially translated... sequence of four or more amino acids or an unbranched sequence of ten or more nucleotides....

  3. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... acids are not intended to be embraced by this definition. Any amino acid sequence that contains post-translationally modified amino acids may be described as the amino acid sequence that is initially translated... sequence of four or more amino acids or an unbranched sequence of ten or more nucleotides....

  4. A method to find palindromes in nucleic acid sequences.

    PubMed

    Anjana, Ramnath; Shankar, Mani; Vaishnavi, Marthandan Kirti; Sekar, Kanagaraj

    2013-01-01

    Various types of sequences in the human genome are known to play important roles in different aspects of genomic functioning. Among these sequences, palindromic nucleic acid sequences are one such type that have been studied in detail and found to influence a wide variety of genomic characteristics. For a nucleotide sequence to be considered as a palindrome, its complementary strand must read the same in the opposite direction. For example, both the strands i.e the strand going from 5' to 3' and its complementary strand from 3' to 5' must be complementary. A typical nucleotide palindromic sequence would be TATA (5' to 3') and its complimentary sequence from 3' to 5' would be ATAT. Thus, a new method has been developed using dynamic programming to fetch the palindromic nucleic acid sequences. The new method uses less memory and thereby it increases the overall speed and efficiency. The proposed method has been tested using the bacterial (3891 KB bases) and human chromosomal sequences (Chr-18: 74366 kb and Chr-Y: 25554 kb) and the computation time for finding the palindromic sequences is in milli seconds. PMID:23515654

  5. Comparison of Exome and Genome Sequencing Technologies for the Complete Capture of Protein‐Coding Regions

    PubMed Central

    Lelieveld, Stefan H.; Spielmann, Malte; Mundlos, Stefan; Veltman, Joris A.

    2015-01-01

    ABSTRACT For next‐generation sequencing technologies, sufficient base‐pair coverage is the foremost requirement for the reliable detection of genomic variants. We investigated whether whole‐genome sequencing (WGS) platforms offer improved coverage of coding regions compared with whole‐exome sequencing (WES) platforms, and compared single‐base coverage for a large set of exome and genome samples. We find that WES platforms have improved considerably in the last years, but at comparable sequencing depth, WGS outperforms WES in terms of covered coding regions. At higher sequencing depth (95x–160x), WES successfully captures 95% of the coding regions with a minimal coverage of 20x, compared with 98% for WGS at 87‐fold coverage. Three different assessments of sequence coverage bias showed consistent biases for WES but not for WGS. We found no clear differences for the technologies concerning their ability to achieve complete coverage of 2,759 clinically relevant genes. We show that WES performs comparable to WGS in terms of covered bases if sequenced at two to three times higher coverage. This does, however, go at the cost of substantially more sequencing biases in WES approaches. Our findings will guide laboratories to make an informed decision on which sequencing platform and coverage to choose. PMID:25973577

  6. Packet error probabilities in direct sequence spread spectrum packet radio networks with BCH codes

    NASA Astrophysics Data System (ADS)

    Georgiopoulos, Michael

    The author computes an upper bound on the packet error probability induced in direct-sequence spread-spectrum networks, when BCH codes are used for the encoding of the packets. The bound, which is introduced here, is valid independently of whether signals arrive with equal or unequal powers at the receiver site. Furthermore, it has a simple form and is easy to compute. In addition, it is valid for other classes of forward error correction codes (e.g., convolutional codes). However, numerical results are presented for BCH codes only.

  7. Evaluation of correlation property of linear-frequency-modulated signals coded by maximum-length sequences

    NASA Astrophysics Data System (ADS)

    Yamanaka, Kota; Hirata, Shinnosuke; Hachiya, Hiroyuki

    2016-07-01

    Ultrasonic distance measurement for obstacles has been recently applied in automobiles. The pulse–echo method based on the transmission of an ultrasonic pulse and time-of-flight (TOF) determination of the reflected echo is one of the typical methods of ultrasonic distance measurement. Improvement of the signal-to-noise ratio (SNR) of the echo and the avoidance of crosstalk between ultrasonic sensors in the pulse–echo method are required in automotive measurement. The SNR of the reflected echo and the resolution of the TOF are improved by the employment of pulse compression using a maximum-length sequence (M-sequence), which is one of the binary pseudorandom sequences generated from a linear feedback shift register (LFSR). Crosstalk is avoided by using transmitted signals coded by different M-sequences generated from different LFSRs. In the case of lower-order M-sequences, however, the number of measurement channels corresponding to the pattern of the LFSR is not enough. In this paper, pulse compression using linear-frequency-modulated (LFM) signals coded by M-sequences has been proposed. The coding of LFM signals by the same M-sequence can produce different transmitted signals and increase the number of measurement channels. In the proposed method, however, the truncation noise in autocorrelation functions and the interference noise in cross-correlation functions degrade the SNRs of received echoes. Therefore, autocorrelation properties and cross-correlation properties in all patterns of combinations of coded LFM signals are evaluated.

  8. Amino acid sequence repertoire of the bacterial proteome and the occurrence of untranslatable sequences.

    PubMed

    Navon, Sharon Penias; Kornberg, Guy; Chen, Jin; Schwartzman, Tali; Tsai, Albert; Puglisi, Elisabetta Viani; Puglisi, Joseph D; Adir, Noam

    2016-06-28

    Bioinformatic analysis of Escherichia coli proteomes revealed that all possible amino acid triplet sequences occur at their expected frequencies, with four exceptions. Two of the four underrepresented sequences (URSs) were shown to interfere with translation in vivo and in vitro. Enlarging the URS by a single amino acid resulted in increased translational inhibition. Single-molecule methods revealed stalling of translation at the entrance of the peptide exit tunnel of the ribosome, adjacent to ribosomal nucleotides A2062 and U2585. Interaction with these same ribosomal residues is involved in regulation of translation by longer, naturally occurring protein sequences. The E. coli exit tunnel has evidently evolved to minimize interaction with the exit tunnel and maximize the sequence diversity of the proteome, although allowing some interactions for regulatory purposes. Bioinformatic analysis of the human proteome revealed no underrepresented triplet sequences, possibly reflecting an absence of regulation by interaction with the exit tunnel. PMID:27307442

  9. Sequence of the nifD gene coding for the α subunit of dinitrogenase from the cyanobacterium Anabaena

    PubMed Central

    Lammers, Peter J.; Haselkorn, Robert

    1983-01-01

    The nucleotide sequence of nifD, the structural gene for the α subunit of dinitrogenase from Anabaena 7120, has been determined. The coding sequence contains 1,440 nucleotides, which predict an amino acid sequence of 480 residues and Mr of 54,283. The predicted sequence contains eight cysteines, of which five are conserved with respect to adjoining sequences and position relative to the α subunits of dinitrogenase from Azotobacter, Clostridium, and Klebsiella. Because there are also five conserved cysteines in the β subunit of Anabaena dinitrogenase [Mazur, B. J. & Chiu, C.-F. (1982) Proc. Natl. Acad. Sci. USA 79, 6782-6786], the number of cysteine residues participating as ligands to FeS clusters is likely to be 20 per α2β2 tetramer. This number is sufficient to accommodate the known four Fe4S4 clusters, leaving at least four cysteines to be shared among the two FeMo cofactors and the more poorly characterized two-iron center. Although the α- and β-subunit gene sequences are not recognizably homologous, their secondary structures, predicted from the sequences, indicate similar domains around three of the conserved cysteine residues. PMID:16593347

  10. Orpinomyces cellulase CelE protein and coding sequences

    DOEpatents

    Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong

    2000-08-29

    A CDNA designated celE cloned from Orpinomyces PC-2 encodes a polypeptide (CelE) of 477 amino acids. CelE is highly homologous to CelB of Orpinomyces (72.3% identity) and Neocallimastix (67.9% identity), and like them, it has a non-catalytic repeated peptide domain (NCRPD) at the C-terminal end. The catalytic domain of CelE is homologous to glycosyl hydrolases of Family 5, found in several anaerobic bacteria. The gene of celE is devoid of introns. The recombinant proteins CelE and CelB of Orpinomyces PC-2 randomly hydrolyze carboxymethylcellulose and cello-oligosaccharides in the pattern of endoglucanases.

  11. In silico comparative analysis of DNA and amino acid sequences for prion protein gene.

    PubMed

    Kim, Y; Lee, J; Lee, C

    2008-01-01

    Genetic variability might contribute to species specificity of prion diseases in various organisms. In this study, structures of the prion protein gene (PRNP) and its amino acids were compared among species of which sequence data were available. Comparisons of PRNP DNA sequences among 12 species including human, chimpanzee, monkey, bovine, ovine, dog, mouse, rat, wallaby, opossum, chicken and zebrafish allowed us to identify candidate regulatory regions in intron 1 and 3'-untranslated region (UTR) in addition to the coding region. Highly conserved putative binding sites for transcription factors, such as heat shock factor 2 (HSF2) and myocite enhancer factor 2 (MEF2), were discovered in the intron 1. In 3'-UTR, the functional sequence (ATTAAA) for nucleus-specific polyadenylation was found in all the analysed species. The functional sequence (TTTTTAT) for maturation-specific polyadenylation was identically observed only in ovine, and one or two nucleotide mismatches in the other species. A comparison of the amino acid sequences in 53 species revealed a large sequence identity. Especially the octapeptide repeat region was observed in all the species but frog and zebrafish. Functional changes and susceptibility to prion diseases with various isoforms of prion protein could be caused by numeric variability and conformational changes discovered in the repeat sequences. PMID:18397498

  12. On Quantum Algorithm for Multiple Alignment of Amino Acid Sequences

    NASA Astrophysics Data System (ADS)

    Iriyama, Satoshi; Ohya, Masanori

    2009-02-01

    The alignment of genome sequences or amino acid sequences is one of fundamental operations for the study of life. Usual computational complexity for the multiple alignment of N sequences with common length L by dynamic programming is O(LN). This alignment is considered as one of the NP problems, so that it is desirable to find a nice algorithm of the multiple alignment. Thus in this paper we propose the quantum algorithm for the multiple alignment based on the works12,1,2 in which the NP complete problem was shown to be the P problem by means of quantum algorithm and chaos information dynamics.

  13. Prebiotically plausible mechanisms increase compositional diversity of nucleic acid sequences

    PubMed Central

    Derr, Julien; Manapat, Michael L.; Rajamani, Sudha; Leu, Kevin; Xulvi-Brunet, Ramon; Joseph, Isaac; Nowak, Martin A.; Chen, Irene A.

    2012-01-01

    During the origin of life, the biological information of nucleic acid polymers must have increased to encode functional molecules (the RNA world). Ribozymes tend to be compositionally unbiased, as is the vast majority of possible sequence space. However, ribonucleotides vary greatly in synthetic yield, reactivity and degradation rate, and their non-enzymatic polymerization results in compositionally biased sequences. While natural selection could lead to complex sequences, molecules with some activity are required to begin this process. Was the emergence of compositionally diverse sequences a matter of chance, or could prebiotically plausible reactions counter chemical biases to increase the probability of finding a ribozyme? Our in silico simulations using a two-letter alphabet show that template-directed ligation and high concatenation rates counter compositional bias and shift the pool toward longer sequences, permitting greater exploration of sequence space and stable folding. We verified experimentally that unbiased DNA sequences are more efficient templates for ligation, thus increasing the compositional diversity of the pool. Our work suggests that prebiotically plausible chemical mechanisms of nucleic acid polymerization and ligation could predispose toward a diverse pool of longer, potentially structured molecules. Such mechanisms could have set the stage for the appearance of functional activity very early in the emergence of life. PMID:22319215

  14. The amino-acid sequence of kangaroo pancreatic ribonuclease.

    PubMed

    Gaastra, W; Welling, G W; Beintema, J J

    1978-05-01

    Red kangaroo (Macropus rufus) ribonuclease was isolated from pancreatic tissue by affinity chromatography. The amino acid sequence was determined by automatic sequencing of overlapping large fragments and by analysis of shorter peptides obtained by digestion with a number of proteolytic enzymes. The polypeptide chain consists of 122 amino acid residues. Compared to other ribonucleases, the N-terminal residue and residue 114 are deleted. In other pancreatic ribonucleases position 114 is occupied by a cis proline residue in an external loop at the surface of the molecule. Other remarkable substitutions are the presence of a tyrosine residue at position 123 instead of a serine which forms a hydrogen bond with the pyrimidine ring of a nucleotide substrate, and a number of hydrophobichydrophilic interchanges in the sequence 51-55, which forms part of an alpha-helix in bovine ribonuclease and exhibits few substitutions in the placental mammals. Kangaroo ribonuclease contains no carbohydrate, although the enzyme possesses a recognition site for carbohydrate attachment in the sequence Asn-Val-Thr (62-64). The enzyme differs at about 35-40% of the positions from all other mammalian pancreatic ribonucleases sequenced to date, which is in agreement with the early divergence between the marsupials and the placental mammals. From fragmentary data a tentative sequence of red-necked wallaby (Macropus rufogriseus) pancreatic ribonuclease has been derived. Eight differences with the kangaroo sequence were found. PMID:658039

  15. Nucleotide and derived amino acid sequences of the major porin of Comamonas acidovorans and comparison of porin primary structures.

    PubMed Central

    Gerbl-Rieger, S; Peters, J; Kellermann, J; Lottspeich, F; Baumeister, W

    1991-01-01

    The DNA sequence of the gene which codes for the major outer membrane porin (Omp32) of Comamonas acidovorans has been determined. The structural gene encodes a precursor consisting of 351 amino acid residues with a signal peptide of 19 amino acid residues. Comparisons with amino acid sequences of outer membrane proteins and porins from several other members of the class Proteobacteria and of the Chlamydia trachomatis porin and the Neurospora crassa mitochondrial porin revealed a motif of eight regions of local homology. The results of this analysis are discussed with regard to common structural features of porins. PMID:1848840

  16. Amino acid codes in mitochondria as possible clues to primitive codes

    NASA Technical Reports Server (NTRS)

    Jukes, T. H.

    1981-01-01

    Differences between mitochondrial codes and the universal code indicate that an evolutionary simplification has taken place, rather than a return to a more primitive code. However, these differences make it evident that the universal code is not the only code possible, and therefore earlier codes may have differed markedly from the previous code. The present universal code is probably a 'frozen accident.' The change in CUN codons from leucine to threonine (Neurospora vs. yeast mitochondria) indicates that neutral or near-neutral changes occurred in the corresponding proteins when this code change took place, caused presumably by a mutation in a tRNA gene.

  17. Stability of the genetic code and optimal parameters of amino acids.

    PubMed

    Chechetkin, V R; Lobzin, V V

    2011-01-21

    The standard genetic code is known to be much more efficient in minimizing adverse effects of misreading errors and one-point mutations in comparison with a random code having the same structure, i.e. the same number of codons coding for each particular amino acid. We study the inverse problem, how the code structure affects the optimal physico-chemical parameters of amino acids ensuring the highest stability of the genetic code. It is shown that the choice of two or more amino acids with given properties determines unambiguously all the others. In this sense the code structure determines strictly the optimal parameters of amino acids or the corresponding scales may be derived directly from the genetic code. In the code with the structure of the standard genetic code the resulting values for hydrophobicity obtained in the scheme "leave one out" and in the scheme with fixed maximum and minimum parameters correlate significantly with the natural scale. The comparison of the optimal and natural parameters allows assessing relative impact of physico-chemical and error-minimization factors during evolution of the genetic code. As the resulting optimal scale depends on the choice of amino acids with given parameters, the technique can also be applied to testing various scenarios of the code evolution with increasing number of codified amino acids. Our results indicate the co-evolution of the genetic code and physico-chemical properties of recruited amino acids. PMID:20955716

  18. Severe accident source term characteristics for selected Peach Bottom sequences predicted by the MELCOR Code

    SciTech Connect

    Carbajo, J.J.

    1993-09-01

    The purpose of this report is to compare in-containment source terms developed for NUREG-1159, which used the Source Term Code Package (STCP), with those generated by MELCOR to identify significant differences. For this comparison, two short-term depressurized station blackout sequences (with a dry cavity and with a flooded cavity) and a Loss-of-Coolant Accident (LOCA) concurrent with complete loss of the Emergency Core Cooling System (ECCS) were analyzed for the Peach Bottom Atomic Power Station (a BWR-4 with a Mark I containment). The results indicate that for the sequences analyzed, the two codes predict similar total in-containment release fractions for each of the element groups. However, the MELCOR/CORBH Package predicts significantly longer times for vessel failure and reduced energy of the released material for the station blackout sequences (when compared to the STCP results). MELCOR also calculated smaller releases into the environment than STCP for the station blackout sequences.

  19. Coding-complete sequencing classifies parrot bornavirus 5 into a novel virus species.

    PubMed

    Marton, Szilvia; Bányai, Krisztián; Gál, János; Ihász, Katalin; Kugler, Renáta; Lengyel, György; Jakab, Ferenc; Bakonyi, Tamás; Farkas, Szilvia L

    2015-11-01

    In this study, we determined the sequence of the coding region of an avian bornavirus detected in a blue-and-yellow macaw (Ara ararauna) with pathological/histopathological changes characteristic of proventricular dilatation disease. The genomic organization of the macaw bornavirus is similar to that of other bornaviruses, and its nucleotide sequence is nearly identical to the available partial parrot bornavirus 5 (PaBV-5) sequences. Phylogenetic analysis showed that these strains formed a monophyletic group distinct from other mammalian and avian bornaviruses and in calculations performed with matrix protein coding sequences, the PaBV-5 and PaBV-6 genotypes formed a common cluster, suggesting that according to the recently accepted classification system for bornaviruses, these two genotypes may belong to a new species, provisionally named Psittaciform 2 bornavirus. PMID:26282234

  20. Is there an error correcting code in the base sequence in DNA?

    PubMed Central

    Liebovitch, L S; Tao, Y; Todorov, A T; Levine, L

    1996-01-01

    Modern methods of encoding information into digital form include error check digits that are functions of the other information digits. When digital information is transmitted, the values of the error check digits can be computed from the information digits to determine whether the information has been received accurately. These error correcting codes make it possible to detect and correct common errors in transmission. The sequence of bases in DNA is also a digital code consisting of four symbols: A, C, G, and T. Does DNA also contain an error correcting code? Such a code would allow repair enzymes to protect the fidelity of nonreplicating DNA and increase the accuracy of replication. If a linear block error correcting code is present in DNA then some bases would be a linear function of the other bases in each set of bases. We developed an efficient procedure to determine whether such an error correcting code is present in the base sequence. We illustrate the use of this procedure by using it to analyze the lac operon and the gene for cytochrome c. These genes do not appear to contain such a simple error correcting code. PMID:8874027

  1. Biosynthesis of riboflavin: cloning, sequencing, and expression of the gene coding for 3,4-dihydroxy-2-butanone 4-phosphate synthase of Escherichia coli.

    PubMed Central

    Richter, G; Volk, R; Krieger, C; Lahm, H W; Röthlisberger, U; Bacher, A

    1992-01-01

    3,4-Dihydroxy-2-butanone 4-phosphate is biosynthesized from ribulose 5-phosphate and serves as the biosynthetic precursor for the xylene ring of riboflavin. The gene coding for 3,4-dihydroxy-2-butanone 4-phosphate synthase of Escherichia coli has been cloned and sequenced. The gene codes for a protein of 217 amino acid residues with a calculated molecular mass of 23,349.6 Da. The enzyme was purified to near homogeneity from a recombinant E. coli strain and had a specific activity of 1,700 nmol mg-1 h-1. The N-terminal amino acid sequence and the amino acid composition of the protein were in agreement with the deduced sequence. The molecular mass as determined by ion spray mass spectrometry was 23,351 +/- 2 Da, which is in agreement with the predicted mass. The previously reported loci htrP, "luxH-like," and ribB at 66 min of the E. coli chromosome are all identical to the gene coding for 3,4-dihydroxy-2-butanone 4-phosphate synthase, but their role had not been hitherto determined. Sequence homology indicates that gene luxH of Vibrio harveyi and the central open reading frame of the Bacillus subtilis riboflavin operon code for 3,4-dihydroxy-2-butanone 4-phosphate synthase. Images PMID:1597419

  2. mPUMA: a computational approach to microbiota analysis by de novo assembly of operational taxonomic units based on protein-coding barcode sequences

    PubMed Central

    2013-01-01

    Background Formation of operational taxonomic units (OTU) is a common approach to data aggregation in microbial ecology studies based on amplification and sequencing of individual gene targets. The de novo assembly of OTU sequences has been recently demonstrated as an alternative to widely used clustering methods, providing robust information from experimental data alone, without any reliance on an external reference database. Results Here we introduce mPUMA (microbial Profiling Using Metagenomic Assembly, http://mpuma.sourceforge.net), a software package for identification and analysis of protein-coding barcode sequence data. It was developed originally for Cpn60 universal target sequences (also known as GroEL or Hsp60). Using an unattended process that is independent of external reference sequences, mPUMA forms OTUs by DNA sequence assembly and is capable of tracking OTU abundance. mPUMA processes microbial profiles both in terms of the direct DNA sequence as well as in the translated amino acid sequence for protein coding barcodes. By forming OTUs and calculating abundance through an assembly approach, mPUMA is capable of generating inputs for several popular microbiota analysis tools. Using SFF data from sequencing of a synthetic community of Cpn60 sequences derived from the human vaginal microbiome, we demonstrate that mPUMA can faithfully reconstruct all expected OTU sequences and produce compositional profiles consistent with actual community structure. Conclusions mPUMA enables analysis of microbial communities while empowering the discovery of novel organisms through OTU assembly. PMID:24451012

  3. Amino acid sequence of Salmonella typhimurium branched-chain amino acid aminotransferase.

    PubMed

    Feild, M J; Nguyen, D C; Armstrong, F B

    1989-06-13

    The complete amino acid sequence of the subunit of branched-chain amino acid aminotransferase (transaminase B, EC 2.6.1.42) of Salmonella typhimurium was determined. An Escherichia coli recombinant containing the ilvGEDAY gene cluster of Salmonella was used as the source of the hexameric enzyme. The peptide fragments used for sequencing were generated by treatment with trypsin, Staphylococcus aureus V8 protease, endoproteinase Lys-C, and cyanogen bromide. The enzyme subunit contains 308 residues and has a molecular weight of 33,920. To determine the coenzyme-binding site, the pyridoxal 5-phosphate containing enzyme was treated with tritiated sodium borohydride prior to trypsin digestion. Peptide map comparisons with an apoenzyme tryptic digest and monitoring radioactivity incorporation allowed identification of the pyridoxylated peptide, which was then isolated and sequenced. The coenzyme-binding site is the lysyl residue at position 159. The amino acid sequence of Salmonella transaminase B is 97.4% identical with that of Escherichia coli, differing in only eight amino acid positions. Sequence comparisons of transaminase B to other known aminotransferase sequences revealed limited sequence similarity (24-33%) when conserved amino acid substitutions are allowed and alignments were forced to occur on the coenzyme-binding site. PMID:2669973

  4. The Purine Bias of Coding Sequences is Determined by Physicochemical Constraints on Proteins

    PubMed Central

    de Leon, Miguel Ponce; de Miranda, Antonio Basilio; Alvarez-Valin, Fernando; Carels, Nicolas

    2014-01-01

    For this report, we analyzed protein secondary structures in relation to the statistics of three nucleotide codon positions. The purpose of this investigation was to find which properties of the ribosome, tRNA or protein level, could explain the purine bias (Rrr) as it is observed in coding DNA. We found that the Rrr pattern is the consequence of a regularity (the codon structure) resulting from physicochemical constraints on proteins and thermodynamic constraints on ribosomal machinery. The physicochemical constraints on proteins mainly come from the hydropathy and molecular weight (MW) of secondary structures as well as the energy cost of amino acid synthesis. These constraints appear through a network of statistical correlations, such as (i) the cost of amino acid synthesis, which is in favor of a higher level of guanine in the first codon position, (ii) the constructive contribution of hydropathy alternation in proteins, (iii) the spatial organization of secondary structure in proteins according to solvent accessibility, (iv) the spatial organization of secondary structure according to amino acid hydropathy, (v) the statistical correlation of MW with protein secondary structures and their overall hydropathy, (vi) the statistical correlation of thymine in the second codon position with hydropathy and the energy cost of amino acid synthesis, and (vii) the statistical correlation of adenine in the second codon position with amino acid complexity and the MW of secondary protein structures. Amino acid physicochemical properties and functional constraints on proteins constitute a code that is translated into a purine bias within the coding DNA via tRNAs. In that sense, the Rrr pattern within coding DNA is the effect of information transfer on nucleotide composition from protein to DNA by selection according to the codon positions. Thus, coding DNA structure and ribosomal machinery co-evolved to minimize the energy cost of protein coding given the functional

  5. Purifying selection shapes the coincident SNP distribution of primate coding sequences.

    PubMed

    Chen, Chia-Ying; Hung, Li-Yuan; Wu, Chan-Shuo; Chuang, Trees-Juen

    2016-01-01

    Genome-wide analysis has observed an excess of coincident single nucleotide polymorphisms (coSNPs) at human-chimpanzee orthologous positions, and suggested that this is due to cryptic variation in the mutation rate. While this phenomenon primarily corresponds with non-coding coSNPs, the situation in coding sequences remains unclear. Here we calculate the observed-to-expected ratio of coSNPs (coSNPO/E) to estimate the prevalence of human-chimpanzee coSNPs, and show that the excess of coSNPs is also present in coding regions. Intriguingly, coSNPO/E is much higher at zero-fold than at nonzero-fold degenerate sites; such a difference is due to an elevation of coSNPO/E at zero-fold degenerate sites, rather than a reduction at nonzero-fold degenerate ones. These trends are independent of chimpanzee subpopulation, population size, or sequencing techniques; and hold in broad generality across primates. We find that this discrepancy cannot fully explained by sequence contexts, shared ancestral polymorphisms, SNP density, and recombination rate, and that coSNPO/E in coding sequences is significantly influenced by purifying selection. We also show that selection and mutation rate affect coSNPO/E independently, and coSNPs tend to be less damaging and more correlated with human diseases than non-coSNPs. These suggest that coSNPs may represent a "signature" during primate protein evolution. PMID:27255481

  6. POLYMORPHISM IN THE CODING REGION SEQUENCE OF GDF8 GENE IN INDIAN SHEEP.

    PubMed

    Pothuraju, M; Mishra, S K; Kumar, S N; Mohamed, N F; Kataria, R S; Yadav, D K; Arora, R

    2015-11-01

    The present study was undertaken to identify polymorphism in the coding sequence of GDF8gene across indigenous meat type sheep breeds. A 1647 bp sequence was generated, encompassing 208 bp of the 5'UTR, 1128 bp of coding region (exon1, 2 and 3) as well as 311 bp of 3'UTR. The sheep and goat GDF8 gene sequences were observed to be highly conserved as compared to cattle, buffalo, horse and pig. Several nucleotide variations were observed across coding sequence of GDF8 gene in Indian sheep. Three polymorphic sites were identified in the 5'UTR, one in exon 1 and one in the exon 2 regions. Both SNPs in the exonic region were found to be non-synonymous. The mutations c.539T > G and c.821T > A discovered in this study in the exon 1 and exon 2, respectively, have not been previously reported. The information generated provides preliminary indication of the functional diversity present in Indian sheep at the coding region of GDF8gene. The novel as well as the previously reported SNPs discovered in the Indian sheep warrant further analysis to see whether they affect the phenotype. Future studies will need to establish the affect of reported SNPs in the expression of the GDF8 gene in Indian sheep population. PMID:26845859

  7. Purifying selection shapes the coincident SNP distribution of primate coding sequences

    PubMed Central

    Chen, Chia-Ying; Hung, Li-Yuan; Wu, Chan-Shuo; Chuang, Trees-Juen

    2016-01-01

    Genome-wide analysis has observed an excess of coincident single nucleotide polymorphisms (coSNPs) at human-chimpanzee orthologous positions, and suggested that this is due to cryptic variation in the mutation rate. While this phenomenon primarily corresponds with non-coding coSNPs, the situation in coding sequences remains unclear. Here we calculate the observed-to-expected ratio of coSNPs (coSNPO/E) to estimate the prevalence of human-chimpanzee coSNPs, and show that the excess of coSNPs is also present in coding regions. Intriguingly, coSNPO/E is much higher at zero-fold than at nonzero-fold degenerate sites; such a difference is due to an elevation of coSNPO/E at zero-fold degenerate sites, rather than a reduction at nonzero-fold degenerate ones. These trends are independent of chimpanzee subpopulation, population size, or sequencing techniques; and hold in broad generality across primates. We find that this discrepancy cannot fully explained by sequence contexts, shared ancestral polymorphisms, SNP density, and recombination rate, and that coSNPO/E in coding sequences is significantly influenced by purifying selection. We also show that selection and mutation rate affect coSNPO/E independently, and coSNPs tend to be less damaging and more correlated with human diseases than non-coSNPs. These suggest that coSNPs may represent a “signature” during primate protein evolution. PMID:27255481

  8. Complete coding sequence of Zika virus from Martinique outbreak in 2015.

    PubMed

    Piorkowski, G; Richard, P; Baronti, C; Gallian, P; Charrel, R; Leparc-Goffart, I; de Lamballerie, X

    2016-05-01

    Zika virus is an Aedes-borne Flavivirus causing fever, arthralgia, myalgia rash, associated with Guillain-Barré syndrome and suspected to induce microcephaly in the fetus. We report here the complete coding sequence of the first characterized Caribbean Zika virus strain, isolated from a patient from Martinique in December, 2015. PMID:27274849

  9. First Complete Coding Sequence of a Spanish Isolate of Swine Vesicular Disease Virus

    PubMed Central

    Vázquez-Calvo, Ángela; Saiz, Juan-Carlos; Martín-Acebes, Miguel A.

    2016-01-01

    Swine vesicular disease virus (SVDV) is a porcine pathogen and a member of the Enterovirus genus within the Picornaviridae family. The SVDV genome is composed of a single-stranded RNA molecule of positive polarity. Here, we report the first complete sequence of the coding region of a Spanish SVDV isolate (SPA/1/'93). PMID:26941157

  10. Complete Coding Sequences of Six Toscana Virus Strains Isolated from Human Patients in France

    PubMed Central

    Leparc-Goffart, Isabelle; Piorkowski, Geraldine; Coutard, Bruno; Papageorgiou, Nicolas; De Lamballerie, Xavier

    2016-01-01

    Toscana virus (TOSV) is an arthropod-borne phlebovirus belonging to the Sandfly fever Naples virus species (genus Phlebovirus, family Bunyaviridae). Here, we report the complete coding sequences of six TOSV strains isolated from human patients having acquired the infection in southeastern France during a 12-year period. PMID:27231377

  11. Successful Recovery of Nuclear Protein-Coding Genes from Small Insects in Museums Using Illumina Sequencing.

    PubMed

    Kanda, Kojun; Pflug, James M; Sproul, John S; Dasenko, Mark A; Maddison, David R

    2015-01-01

    In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles

  12. Successful Recovery of Nuclear Protein-Coding Genes from Small Insects in Museums Using Illumina Sequencing

    PubMed Central

    Dasenko, Mark A.

    2015-01-01

    In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles

  13. Amino acid sequence of bovine heart coupling factor 6.

    PubMed Central

    Fang, J K; Jacobs, J W; Kanner, B I; Racker, E; Bradshaw, R A

    1984-01-01

    The amino acid sequence of bovine heart mitochondrial coupling factor 6 (F6) has been determined by automated Edman degradation of the whole protein and derived peptides. Preparations based on heat precipitation and ethanol extraction showed allotypic variation at three positions while material further purified by HPLC yielded only one sequence that also differed by a Phe-Thr replacement at residue 62. The mature protein contains 76 amino acids with a calculated molecular weight of 9006 and a pI of approximately equal to 5, in good agreement with experimentally measured values. The charged amino acids are mainly clustered at the termini and in one section in the middle; these three polar segments are separated by two segments relatively rich in nonpolar residues. Chou-Fasman analysis suggests three stretches of alpha-helix coinciding (or within) the high-charge-density sequences with a single beta-turn at the first polar-nonpolar junction. Comparison of the F6 sequence with those of other proteins did not reveal any homologous structures. PMID:6149548

  14. Sequence analysis and identification of new variations in the coding sequence of melatonin receptor gene (MTNR1A) of Indian Chokla sheep breed

    PubMed Central

    Saxena, Vijay Kumar; Jha, Bipul Kumar; Meena, Amar Singh; Naqvi, S.M.K.

    2014-01-01

    Melatonin receptor 1A gene is the prime receptor mediating the effect of melatonin at the neuroendocrine level for control of seasonal reproduction in sheep. The aims of this study were to examine the polymorphism pattern of coding sequence of MTNR1A gene in Chokla sheep, a breed of Indian arid tract and to identify new variations in relation to its aseasonal status. Genomic DNAs of 101 Chokla sheep were collected and an 824 bp coding sequence of Exon II was amplified. RFLP was performed with enzyme RsaI and MnlI to assess the presence of polymorphism at position C606T and G612A, respectively. Genotyping revealed significantly higher frequency of M and R alleles than m and r alleles. RR and MM were found to be dominantly present in the group of studied population. Cloning and sequencing of Exon II followed by mutation/polymorphism analysis revealed ten mutations of which three were non-synonymous mutations (G706A, C893A, G931C). G706A leads to substitution of valine by isoleucine Val125I (U14109) in the fifth transmembrane domain. C893A leads to substitution of alanine by aspartic acid in the third extracellular loop. G931C mutation brings about substitution of amino acid alanine by proline in the seventh transmembrane helix, can affect the conformational stability of the molecule. Polyphen-2 analysis revealed that the polymorphism at position 931 is potentially damaging while the mutations at positions 706 and 893 were benign. It is concluded that G931C mutation of MTNR 1A gene, may explain, in part, the importance of melatonin structure integrity in influencing seasonality in sheep. PMID:25606429

  15. Different evolutionary patterns of SNPs between domains and unassigned regions in human protein-coding sequences.

    PubMed

    Pang, Erli; Wu, Xiaomei; Lin, Kui

    2016-06-01

    Protein evolution plays an important role in the evolution of each genome. Because of their functional nature, in general, most of their parts or sites are differently constrained selectively, particularly by purifying selection. Most previous studies on protein evolution considered individual proteins in their entirety or compared protein-coding sequences with non-coding sequences. Less attention has been paid to the evolution of different parts within each protein of a given genome. To this end, based on PfamA annotation of all human proteins, each protein sequence can be split into two parts: domains or unassigned regions. Using this rationale, single nucleotide polymorphisms (SNPs) in protein-coding sequences from the 1000 Genomes Project were mapped according to two classifications: SNPs occurring within protein domains and those within unassigned regions. With these classifications, we found: the density of synonymous SNPs within domains is significantly greater than that of synonymous SNPs within unassigned regions; however, the density of non-synonymous SNPs shows the opposite pattern. We also found there are signatures of purifying selection on both the domain and unassigned regions. Furthermore, the selective strength on domains is significantly greater than that on unassigned regions. In addition, among all of the human protein sequences, there are 117 PfamA domains in which no SNPs are found. Our results highlight an important aspect of protein domains and may contribute to our understanding of protein evolution. PMID:26833483

  16. Nucleotide and predicted amino acid sequences of cloned human and mouse preprocathepsin B cDNAs.

    PubMed Central

    Chan, S J; San Segundo, B; McCormick, M B; Steiner, D F

    1986-01-01

    Cathepsin B is a lysosomal thiol proteinase that may have additional extralysosomal functions. To further our investigations on the structure, mode of biosynthesis, and intracellular sorting of this enzyme, we have determined the complete coding sequences for human and mouse preprocathepsin B by using cDNA clones isolated from human hepatoma and kidney phage libraries. The nucleotide sequences predict that the primary structure of preprocathepsin B contains 339 amino acids organized as follows: a 17-residue NH2-terminal prepeptide sequence followed by a 62-residue propeptide region, 254 residues in mature (single chain) cathepsin B, and a 6-residue extension at the COOH terminus. A comparison of procathepsin B sequences from three species (human, mouse, and rat) reveals that the homology between the propeptides is relatively conserved with a minimum of 68% sequence identity. In particular, two conserved sequences in the propeptide that may be functionally significant include a potential glycosylation site and the presence of a single cysteine at position 59. Comparative analysis of the three sequences also suggests that processing of procathepsin B is a multistep process, during which enzymatically active intermediate forms may be generated. The availability of the cDNA clones will facilitate the identification of possible active or inactive intermediate processive forms as well as studies on the transcriptional regulation of the cathepsin B gene. PMID:3463996

  17. Systematic analysis of coding and noncoding DNA sequences using methods of statistical linguistics

    NASA Technical Reports Server (NTRS)

    Mantegna, R. N.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1995-01-01

    We compare the statistical properties of coding and noncoding regions in eukaryotic and viral DNA sequences by adapting two tests developed for the analysis of natural languages and symbolic sequences. The data set comprises all 30 sequences of length above 50 000 base pairs in GenBank Release No. 81.0, as well as the recently published sequences of C. elegans chromosome III (2.2 Mbp) and yeast chromosome XI (661 Kbp). We find that for the three chromosomes we studied the statistical properties of noncoding regions appear to be closer to those observed in natural languages than those of coding regions. In particular, (i) a n-tuple Zipf analysis of noncoding regions reveals a regime close to power-law behavior while the coding regions show logarithmic behavior over a wide interval, while (ii) an n-gram entropy measurement shows that the noncoding regions have a lower n-gram entropy (and hence a larger "n-gram redundancy") than the coding regions. In contrast to the three chromosomes, we find that for vertebrates such as primates and rodents and for viral DNA, the difference between the statistical properties of coding and noncoding regions is not pronounced and therefore the results of the analyses of the investigated sequences are less conclusive. After noting the intrinsic limitations of the n-gram redundancy analysis, we also briefly discuss the failure of the zeroth- and first-order Markovian models or simple nucleotide repeats to account fully for these "linguistic" features of DNA. Finally, we emphasize that our results by no means prove the existence of a "language" in noncoding DNA.

  18. Long-range correlation properties of coding and noncoding DNA sequences: GenBank analysis

    NASA Technical Reports Server (NTRS)

    Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Matsa, M. E.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1995-01-01

    An open question in computational molecular biology is whether long-range correlations are present in both coding and noncoding DNA or only in the latter. To answer this question, we consider all 33301 coding and all 29453 noncoding eukaryotic sequences--each of length larger than 512 base pairs (bp)--in the present release of the GenBank to dtermine whether there is any statistically significant distinction in their long-range correlation properties. Standard fast Fourier transform (FFT) analysis indicates that coding sequences have practically no correlations in the range from 10 bp to 100 bp (spectral exponent beta=0.00 +/- 0.04, where the uncertainty is two standard deviations). In contrast, for noncoding sequences, the average value of the spectral exponent beta is positive (0.16 +/- 0.05) which unambiguously shows the presence of long-range correlations. We also separately analyze the 874 coding and the 1157 noncoding sequences that have more than 4096 bp and find a larger region of power-law behavior. We calculate the probability that these two data sets (coding and noncoding) were drawn from the same distribution and we find that it is less than 10(-10). We obtain independent confirmation of these findings using the method of detrended fluctuation analysis (DFA), which is designed to treat sequences with statistical heterogeneity, such as DNA's known mosaic structure ("patchiness") arising from the nonstationarity of nucleotide concentration. The near-perfect agreement between the two independent analysis methods, FFT and DFA, increases the confidence in the reliability of our conclusion.

  19. Sequences Of Amino Acids For Human Serum Albumin

    NASA Technical Reports Server (NTRS)

    Carter, Daniel C.

    1992-01-01

    Sequences of amino acids defined for use in making polypeptides one-third to one-sixth as large as parent human serum albumin molecule. Smaller, chemically stable peptides have diverse applications including service as artificial human serum and as active components of biosensors and chromatographic matrices. In applications involving production of artificial sera from new sequences, little or no concern about viral contaminants. Smaller genetically engineered polypeptides more easily expressed and produced in large quantities, making commercial isolation and production more feasible and profitable.

  20. Rapid quantification of mutant fitness in diverse bacteria by sequencing randomly bar-coded transposons

    SciTech Connect

    Wetmore, Kelly M.; Price, Morgan N.; Waters, Robert J.; Lamson, Jacob S.; He, Jennifer; Hoover, Cindi A.; Blow, Matthew J.; Bristow, James; Butland, Gareth; Arkin, Adam P.; Deutschbauer, Adam

    2015-05-12

    Transposon mutagenesis with next-generation sequencing (TnSeq) is a powerful approach to annotate gene function in bacteria, but existing protocols for TnSeq require laborious preparation of every sample before sequencing. Thus, the existing protocols are not amenable to the throughput necessary to identify phenotypes and functions for the majority of genes in diverse bacteria. Here, we present a method, random bar code transposon-site sequencing (RB-TnSeq), which increases the throughput of mutant fitness profiling by incorporating random DNA bar codes into Tn5 and mariner transposons and by using bar code sequencing (BarSeq) to assay mutant fitness. RB-TnSeq can be used with any transposon, and TnSeq is performed once per organism instead of once per sample. Each BarSeq assay requires only a simple PCR, and 48 to 96 samples can be sequenced on one lane of an Illumina HiSeq system. We demonstrate the reproducibility and biological significance of RB-TnSeq with Escherichia coli, Phaeobacter inhibens, Pseudomonas stutzeri, Shewanella amazonensis, and Shewanella oneidensis. To demonstrate the increased throughput of RB-TnSeq, we performed 387 successful genome-wide mutant fitness assays representing 130 different bacterium-carbon source combinations and identified 5,196 genes with significant phenotypes across the five bacteria. In P. inhibens, we used our mutant fitness data to identify genes important for the utilization of diverse carbon substrates, including a putative D-mannose isomerase that is required for mannitol catabolism. RB-TnSeq will enable the cost-effective functional annotation of diverse bacteria using mutant fitness profiling. A large challenge in microbiology is the functional assessment of the millions of uncharacterized genes identified by genome sequencing. Transposon mutagenesis coupled to next-generation sequencing (TnSeq) is a powerful approach to assign phenotypes and functions to genes

  1. Rapid quantification of mutant fitness in diverse bacteria by sequencing randomly bar-coded transposons

    DOE PAGESBeta

    Wetmore, Kelly M.; Price, Morgan N.; Waters, Robert J.; Lamson, Jacob S.; He, Jennifer; Hoover, Cindi A.; Blow, Matthew J.; Bristow, James; Butland, Gareth; Arkin, Adam P.; et al

    2015-05-12

    Transposon mutagenesis with next-generation sequencing (TnSeq) is a powerful approach to annotate gene function in bacteria, but existing protocols for TnSeq require laborious preparation of every sample before sequencing. Thus, the existing protocols are not amenable to the throughput necessary to identify phenotypes and functions for the majority of genes in diverse bacteria. Here, we present a method, random bar code transposon-site sequencing (RB-TnSeq), which increases the throughput of mutant fitness profiling by incorporating random DNA bar codes into Tn5 and mariner transposons and by using bar code sequencing (BarSeq) to assay mutant fitness. RB-TnSeq can be used with anymore » transposon, and TnSeq is performed once per organism instead of once per sample. Each BarSeq assay requires only a simple PCR, and 48 to 96 samples can be sequenced on one lane of an Illumina HiSeq system. We demonstrate the reproducibility and biological significance of RB-TnSeq with Escherichia coli, Phaeobacter inhibens, Pseudomonas stutzeri, Shewanella amazonensis, and Shewanella oneidensis. To demonstrate the increased throughput of RB-TnSeq, we performed 387 successful genome-wide mutant fitness assays representing 130 different bacterium-carbon source combinations and identified 5,196 genes with significant phenotypes across the five bacteria. In P. inhibens, we used our mutant fitness data to identify genes important for the utilization of diverse carbon substrates, including a putative D-mannose isomerase that is required for mannitol catabolism. RB-TnSeq will enable the cost-effective functional annotation of diverse bacteria using mutant fitness profiling. A large challenge in microbiology is the functional assessment of the millions of uncharacterized genes identified by genome sequencing. Transposon mutagenesis coupled to next-generation sequencing (TnSeq) is a powerful approach to assign phenotypes and functions to genes. However, the current strategies for TnSeq are

  2. Complete nucleic acid sequence of Penaeus stylirostris densovirus (PstDNV) from India.

    PubMed

    Rai, Praveen; Safeena, Muhammed P; Karunasagar, Iddya; Karunasagar, Indrani

    2011-06-01

    Infectious hypodermal and hematopoietic necrosis virus (IHHNV) of shrimp, recently been classified as Penaeus stylirostris densovirus (PstDNV). The complete nucleic acid sequence of PstDNV from India was obtained by cloning and sequencing of different DNA fragment of the virus. The genome organisation of PstDNV revealed that there were three major coding domains: a left ORF (NS1) of 2001 bp, a mid ORF (NS2) of 1092 bp and a right ORF (VP) of 990 bp. The complete genome and amino acid sequences of three proteins viz., NS1, NS2 and VP were compared with the genomes of the virus reported from Hawaii, China and Mexico and with partial sequence available from isolates from different regions. The phylogenetic analysis of shrimp, insect and vertebrate parvovirus sequences showed that the Indian PstDNV isolate is phylogenetically more closely related to one of the three isolates from Taiwan (AY355307), and two isolates (AY362547 and AY102034) from Thailand. PMID:21402111

  3. Nanopores and nucleic acids: prospects for ultrarapid sequencing

    NASA Technical Reports Server (NTRS)

    Deamer, D. W.; Akeson, M.

    2000-01-01

    DNA and RNA molecules can be detected as they are driven through a nanopore by an applied electric field at rates ranging from several hundred microseconds to a few milliseconds per molecule. The nanopore can rapidly discriminate between pyrimidine and purine segments along a single-stranded nucleic acid molecule. Nanopore detection and characterization of single molecules represents a new method for directly reading information encoded in linear polymers. If single-nucleotide resolution can be achieved, it is possible that nucleic acid sequences can be determined at rates exceeding a thousand bases per second.

  4. Amino acid sequence of the Amur tiger prion protein.

    PubMed

    Wu, Changde; Pang, Wanyong; Zhao, Deming

    2006-10-01

    Prion diseases are fatal neurodegenerative disorders in human and animal associated with conformational conversion of a cellular prion protein (PrP(C)) into the pathologic isoform (PrP(Sc)). Various data indicate that the polymorphisms within the open reading frame (ORF) of PrP are associated with the susceptibility and control the species barrier in prion diseases. In the present study, partial Prnp from 25 Amur tigers (tPrnp) were cloned and screened for polymorphisms. Four single nucleotide polymorphisms (T423C, A501G, C511A, A610G) were found; the C511A and A610G nucleotide substitutions resulted in the amino acid changes Lysine171Glutamine and Alanine204Threoine, respectively. The tPrnp amino acid sequence is similar to house cat (Felis catus ) and sheep, but differs significantly from other two cat Prnp sequences that were previously deposited in GenBank. PMID:16780982

  5. The Evolution of Bony Vertebrate Enhancers at Odds with Their Coding Sequence Landscape

    PubMed Central

    Yousaf, Aisha; Sohail Raza, Muhammad; Ali Abbasi, Amir

    2015-01-01

    Enhancers lie at the heart of transcriptional and developmental gene regulation. Therefore, changes in enhancer sequences usually disrupt the target gene expression and result in disease phenotypes. Despite the well-established role of enhancers in development and disease, evolutionary sequence studies are lacking. The current study attempts to unravel the puzzle of bony vertebrates’ conserved noncoding elements (CNE) enhancer evolution. Bayesian phylogenetics of enhancer sequences spotlights promising interordinal relationships among placental mammals, proposing a closer relationship between humans and laurasiatherians while placing rodents at the basal position. Clock-based estimates of enhancer evolution provided a dynamic picture of interspecific rate changes across the bony vertebrate lineage. Moreover, coelacanth in the study augmented our appreciation of the vertebrate cis-regulatory evolution during water–land transition. Intriguingly, we observed a pronounced upsurge in enhancer evolution in land-dwelling vertebrates. These novel findings triggered us to further investigate the evolutionary trend of coding as well as CNE nonenhancer repertoires, to highlight the relative evolutionary dynamics of diverse genomic landscapes. Surprisingly, the evolutionary rates of enhancer sequences were clearly at odds with those of the coding and the CNE nonenhancer sequences during vertebrate adaptation to land, with land vertebrates exhibiting significantly reduced rates of coding sequence evolution in comparison to their fast evolving regulatory landscape. The observed variation in tetrapod cis-regulatory elements caused the fine-tuning of associated gene regulatory networks. Therefore, the increased evolutionary rate of tetrapods’ enhancer sequences might be responsible for the variation in developmental regulatory circuits during the process of vertebrate adaptation to land. PMID:26253316

  6. The Evolution of Bony Vertebrate Enhancers at Odds with Their Coding Sequence Landscape.

    PubMed

    Yousaf, Aisha; Sohail Raza, Muhammad; Ali Abbasi, Amir

    2015-08-01

    Enhancers lie at the heart of transcriptional and developmental gene regulation. Therefore, changes in enhancer sequences usually disrupt the target gene expression and result in disease phenotypes. Despite the well-established role of enhancers in development and disease, evolutionary sequence studies are lacking. The current study attempts to unravel the puzzle of bony vertebrates' conserved noncoding elements (CNE) enhancer evolution. Bayesian phylogenetics of enhancer sequences spotlights promising interordinal relationships among placental mammals, proposing a closer relationship between humans and laurasiatherians while placing rodents at the basal position. Clock-based estimates of enhancer evolution provided a dynamic picture of interspecific rate changes across the bony vertebrate lineage. Moreover, coelacanth in the study augmented our appreciation of the vertebrate cis-regulatory evolution during water-land transition. Intriguingly, we observed a pronounced upsurge in enhancer evolution in land-dwelling vertebrates. These novel findings triggered us to further investigate the evolutionary trend of coding as well as CNE nonenhancer repertoires, to highlight the relative evolutionary dynamics of diverse genomic landscapes. Surprisingly, the evolutionary rates of enhancer sequences were clearly at odds with those of the coding and the CNE nonenhancer sequences during vertebrate adaptation to land, with land vertebrates exhibiting significantly reduced rates of coding sequence evolution in comparison to their fast evolving regulatory landscape. The observed variation in tetrapod cis-regulatory elements caused the fine-tuning of associated gene regulatory networks. Therefore, the increased evolutionary rate of tetrapods' enhancer sequences might be responsible for the variation in developmental regulatory circuits during the process of vertebrate adaptation to land. PMID:26253316

  7. Large-scale coding sequence change underlies the evolution of postdevelopmental novelty in honey bees.

    PubMed

    Jasper, William Cameron; Linksvayer, Timothy A; Atallah, Joel; Friedman, Daniel; Chiu, Joanna C; Johnson, Brian R

    2015-02-01

    Whether coding or regulatory sequence change is more important to the evolution of phenotypic novelty is one of biology's major unresolved questions. The field of evo-devo has shown that in early development changes to regulatory regions are the dominant mode of genetic change, but whether this extends to the evolution of novel phenotypes in the adult organism is unclear. Here, we conduct ten RNA-Seq experiments across both novel and conserved tissues in the honey bee to determine to what extent postdevelopmental novelty is based on changes to the coding regions of genes. We make several discoveries. First, we show that with respect to novel physiological functions in the adult animal, positively selected tissue-specific genes of high expression underlie novelty by conferring specialized cellular functions. Such genes are often, but not always taxonomically restricted genes (TRGs). We further show that positively selected genes, whether TRGs or conserved genes, are the least connected genes within gene expression networks. Overall, this work suggests that the evo-devo paradigm is limited, and that the evolution of novelty, postdevelopment, follows additional rules. Specifically, evo-devo stresses that high network connectedness (repeated use of the same gene in many contexts) constrains coding sequence change as it would lead to negative pleiotropic effects. Here, we show that in the adult animal, the converse is true: Genes with low network connectedness (TRGs and tissue-specific conserved genes) underlie novel phenotypes by rapidly changing coding sequence to perform new-specialized functions. PMID:25351750

  8. Using a color-coded ambigraphic nucleic acid notation to visualize conserved palindromic motifs within and across genomes

    PubMed Central

    2014-01-01

    Background Ambiscript is a graphically-designed nucleic acid notation that uses symbol symmetries to support sequence complementation, highlight biologically-relevant palindromes, and facilitate the analysis of consensus sequences. Although the original Ambiscript notation was designed to easily represent consensus sequences for multiple sequence alignments, the notation’s black-on-white ambiguity characters are unable to reflect the statistical distribution of nucleotides found at each position. We now propose a color-augmented ambigraphic notation to encode the frequency of positional polymorphisms in these consensus sequences. Results We have implemented this color-coding approach by creating an Adobe Flash® application ( http://www.ambiscript.org) that shades and colors modified Ambiscript characters according to the prevalence of the encoded nucleotide at each position in the alignment. The resulting graphic helps viewers perceive biologically-relevant patterns in multiple sequence alignments by uniquely combining color, shading, and character symmetries to highlight palindromes and inverted repeats in conserved DNA motifs. Conclusion Juxtaposing an intuitive color scheme over the deliberate character symmetries of an ambigraphic nucleic acid notation yields a highly-functional nucleic acid notation that maximizes information content and successfully embodies key principles of graphic excellence put forth by the statistician and graphic design theorist, Edward Tufte. PMID:24447494

  9. Affinity regression predicts the recognition code of nucleic acid binding proteins

    PubMed Central

    Pelossof, Raphael; Singh, Irtisha; Yang, Julie L.; Weirauch, Matthew T.; Hughes, Timothy R.; Leslie, Christina S.

    2016-01-01

    Predicting the affinity profiles of nucleic acid-binding proteins directly from the protein sequence is a major unsolved problem. We present a statistical approach for learning the recognition code of a family of transcription factors (TFs) or RNA-binding proteins (RBPs) from high-throughput binding assays. Our method, called affinity regression, trains on protein binding microarray (PBM) or RNA compete experiments to learn an interaction model between proteins and nucleic acids, using only protein domain and probe sequences as inputs. By training on mouse homeodomain PBM profiles, our model correctly identifies residues that confer DNA-binding specificity and accurately predicts binding motifs for an independent set of divergent homeodomains. Similarly, learning from RNA compete profiles for diverse RBPs, our model can predict the binding affinities of held-out proteins and identify key RNA-binding residues. More broadly, we envision applying our method to model and predict biological interactions in any setting where there is a high-throughput ‘affinity’ readout. PMID:26571099

  10. Widespread Differential Expression of Coding Region and 3' UTR Sequences in Neurons and Other Tissues.

    PubMed

    Kocabas, Arif; Duarte, Terence; Kumar, Saranya; Hynes, Mary A

    2015-12-16

    Mature messenger RNAs (mRNAs) consist of coding sequence (CDS) and 5' and 3' UTRs, typically expected to show similar abundance within a given neuron. Examining mRNA from defined neurons, we unexpectedly show extremely common unbalanced expression of cognate 3' UTR and CDS sequences; many genes show high 3' UTR relative to CDS, others show high CDS to 3' UTR. In situ hybridization (19 of 19 genes) shows a broad range of 3' UTR-to-CDS expression ratios across neurons and tissues. Ratios may be spatially graded or change with developmental age but are consistent across animals. Further, for two genes examined, a 3' UTR-to-CDS ratio above a particular threshold in any given neuron correlated with reduced or undetectable protein expression. Our findings raise questions about the role of isolated 3' UTR sequences in regulation of protein expression and highlight the importance of separately examining 3' UTR and CDS sequences in gene expression analyses. PMID:26687222

  11. The Complete Genome Sequence of the Lactic Acid Bacterium Lactococcus lactis ssp. lactis IL1403

    PubMed Central

    Bolotin, Alexander; Wincker, Patrick; Mauger, Stéphane; Jaillon, Olivier; Malarme, Karine; Weissenbach, Jean; Ehrlich, S. Dusko; Sorokin, Alexei

    2001-01-01

    Lactococcus lactis is a nonpathogenic AT-rich gram-positive bacterium closely related to the genus Streptococcus and is the most commonly used cheese starter. It is also the best-characterized lactic acid bacterium. We sequenced the genome of the laboratory strain IL1403, using a novel two-step strategy that comprises diagnostic sequencing of the entire genome and a shotgun polishing step. The genome contains 2,365,589 base pairs and encodes 2310 proteins, including 293 protein-coding genes belonging to six prophages and 43 insertion sequence (IS) elements. Nonrandom distribution of IS elements indicates that the chromosome of the sequenced strain may be a product of recent recombination between two closely related genomes. A complete set of late competence genes is present, indicating the ability of L. lactis to undergo DNA transformation. Genomic sequence revealed new possibilities for fermentation pathways and for aerobic respiration. It also indicated a horizontal transfer of genetic information from Lactococcus to gram-negative enteric bacteria of Salmonella-Escherichia group. [The sequence data described in this paper has been submitted to the GenBank data library under accession no. AE005176.] PMID:11337471

  12. Oxytocin receptor gene sequences in owl monkeys and other primates show remarkable interspecific regulatory and protein coding variation.

    PubMed

    Babb, Paul L; Fernandez-Duque, Eduardo; Schurr, Theodore G

    2015-10-01

    The oxytocin (OT) hormone pathway is involved in numerous physiological processes, and one of its receptor genes (OXTR) has been implicated in pair bonding behavior in mammalian lineages. This observation is important for understanding social monogamy in primates, which occurs in only a small subset of taxa, including Azara's owl monkey (Aotus azarae). To examine the potential relationship between social monogamy and OXTR variation, we sequenced its 5' regulatory (4936bp) and coding (1167bp) regions in 25 owl monkeys from the Argentinean Gran Chaco, and examined OXTR sequences from 1092 humans from the 1000 Genomes Project. We also assessed interspecific variation of OXTR in 25 primate and rodent species that represent a set of phylogenetically and behaviorally disparate taxa. Our analysis revealed substantial variation in the putative 5' regulatory region of OXTR, with marked structural differences across primate taxa, particularly for humans and chimpanzees, which exhibited unique patterns of large motifs of dinucleotide A+T repeats upstream of the OXTR 5' UTR. In addition, we observed a large number of amino acid substitutions in the OXTR CDS region among New World primate taxa that distinguish them from Old World primates. Furthermore, primate taxa traditionally defined as socially monogamous (e.g., gibbons, owl monkeys, titi monkeys, and saki monkeys) all exhibited different amino acid motifs for their respective OXTR protein coding sequences. These findings support the notion that monogamy has evolved independently in Old World and New World primates, and that it has done so through different molecular mechanisms, not exclusively through the oxytocin pathway. PMID:26025428

  13. Quantum-Sequencing: Biophysics of quantum tunneling through nucleic acids

    NASA Astrophysics Data System (ADS)

    Casamada Ribot, Josep; Chatterjee, Anushree; Nagpal, Prashant

    2014-03-01

    Tunneling microscopy and spectroscopy has extensively been used in physical surface sciences to study quantum tunneling to measure electronic local density of states of nanomaterials and to characterize adsorbed species. Quantum-Sequencing (Q-Seq) is a new method based on tunneling microscopy for electronic sequencing of single molecule of nucleic acids. A major goal of third-generation sequencing technologies is to develop a fast, reliable, enzyme-free single-molecule sequencing method. Here, we present the unique ``electronic fingerprints'' for all nucleotides on DNA and RNA using Q-Seq along their intrinsic biophysical parameters. We have analyzed tunneling spectra for the nucleotides at different pH conditions and analyzed the HOMO, LUMO and energy gap for all of them. In addition we show a number of biophysical parameters to further characterize all nucleobases (electron and hole transition voltage and energy barriers). These results highlight the robustness of Q-Seq as a technique for next-generation sequencing.

  14. Nucleotide and predicted amino acid sequence of a cDNA clone encoding part of human transketolase.

    PubMed

    Abedinia, M; Layfield, R; Jones, S M; Nixon, P F; Mattick, J S

    1992-03-31

    Transketolase is a key enzyme in the pentose-phosphate pathway which has been implicated in the latent human genetic disease, Wernicke-Korsakoff syndrome. Here we report the cloning and partial characterisation of the coding sequences encoding human transketolase from a human brain cDNA library. The library was screened with oligonucleotide probes based on the amino acid sequence of proteolytic fragments of the purified protein. Northern blots showed that the transketolase mRNA is approximately 2.2 kb, close to the minimum expected, of which approximately 60% was represented in the largest cDNA clone. Sequence analysis of the transketolase coding sequences reveals a number of homologies with related enzymes from other species. PMID:1567394

  15. SinEx DB: a database for single exon coding sequences in mammalian genomes

    PubMed Central

    Jorquera, Roddy; Ortiz, Rodrigo; Ossandon, F.; Cárdenas, Juan Pablo; Sepúlveda, Rene; González, Carolina; Holmes, David S.

    2016-01-01

    Eukaryotic genes are typically interrupted by intragenic, noncoding sequences termed introns. However, some genes lack introns in their coding sequence (CDS) and are generally known as ‘single exon genes’ (SEGs). In this work, a SEG is defined as a nuclear, protein-coding gene that lacks introns in its CDS. Whereas, many public databases of Eukaryotic multi-exon genes are available, there are only two specialized databases for SEGs. The present work addresses the need for a more extensive and diverse database by creating SinEx DB, a publicly available, searchable database of predicted SEGs from 10 completely sequenced mammalian genomes including human. SinEx DB houses the DNA and protein sequence information of these SEGs and includes their functional predictions (KOG) and the relative distribution of these functions within species. The information is stored in a relational database built with My SQL Server 5.1.33 and the complete dataset of SEG sequences and their functional predictions are available for downloading. SinEx DB can be interrogated by: (i) a browsable phylogenetic schema, (ii) carrying out BLAST searches to the in-house SinEx DB of SEGs and (iii) via an advanced search mode in which the database can be searched by key words and any combination of searches by species and predicted functions. SinEx DB provides a rich source of information for advancing our understanding of the evolution and function of SEGs. Database URL: www.sinex.cl PMID:27278816

  16. SinEx DB: a database for single exon coding sequences in mammalian genomes.

    PubMed

    Jorquera, Roddy; Ortiz, Rodrigo; Ossandon, F; Cárdenas, Juan Pablo; Sepúlveda, Rene; González, Carolina; Holmes, David S

    2016-01-01

    Eukaryotic genes are typically interrupted by intragenic, noncoding sequences termed introns. However, some genes lack introns in their coding sequence (CDS) and are generally known as 'single exon genes' (SEGs). In this work, a SEG is defined as a nuclear, protein-coding gene that lacks introns in its CDS. Whereas, many public databases of Eukaryotic multi-exon genes are available, there are only two specialized databases for SEGs. The present work addresses the need for a more extensive and diverse database by creating SinEx DB, a publicly available, searchable database of predicted SEGs from 10 completely sequenced mammalian genomes including human. SinEx DB houses the DNA and protein sequence information of these SEGs and includes their functional predictions (KOG) and the relative distribution of these functions within species. The information is stored in a relational database built with My SQL Server 5.1.33 and the complete dataset of SEG sequences and their functional predictions are available for downloading. SinEx DB can be interrogated by: (i) a browsable phylogenetic schema, (ii) carrying out BLAST searches to the in-house SinEx DB of SEGs and (iii) via an advanced search mode in which the database can be searched by key words and any combination of searches by species and predicted functions. SinEx DB provides a rich source of information for advancing our understanding of the evolution and function of SEGs.Database URL: www.sinex.cl. PMID:27278816

  17. Sequence Prediction With Sparse Distributed Hyperdimensional Coding Applied to the Analysis of Mobile Phone Use Patterns.

    PubMed

    Rasanen, Okko J; Saarinen, Jukka P

    2016-09-01

    Modeling and prediction of temporal sequences is central to many signal processing and machine learning applications. Prediction based on sequence history is typically performed using parametric models, such as fixed-order Markov chains ( n -grams), approximations of high-order Markov processes, such as mixed-order Markov models or mixtures of lagged bigram models, or with other machine learning techniques. This paper presents a method for sequence prediction based on sparse hyperdimensional coding of the sequence structure and describes how higher order temporal structures can be utilized in sparse coding in a balanced manner. The method is purely incremental, allowing real-time online learning and prediction with limited computational resources. Experiments with prediction of mobile phone use patterns, including the prediction of the next launched application, the next GPS location of the user, and the next artist played with the phone media player, reveal that the proposed method is able to capture the relevant variable-order structure from the sequences. In comparison with the n -grams and the mixed-order Markov models, the sparse hyperdimensional predictor clearly outperforms its peers in terms of unweighted average recall and achieves an equal level of weighted average recall as the mixed-order Markov chain but without the batch training of the mixed-order model. PMID:26285224

  18. Correlation between fibroin amino acid sequence and physical silk properties.

    PubMed

    Fedic, Robert; Zurovec, Michal; Sehnal, Frantisek

    2003-09-12

    The fiber properties of lepidopteran silk depend on the amino acid repeats that interact during H-fibroin polymerization. The aim of our research was to relate repeat composition to insect biology and fiber strength. Representative regions of the H-fibroin genes were sequenced and analyzed in three pyralid species: wax moth (Galleria mellonella), European flour moth (Ephestia kuehniella), and Indian meal moth (Plodia interpunctella). The amino acid repeats are species-specific, evidently a diversification of an ancestral region of 43 residues, and include three types of regularly dispersed motifs: modifications of GSSAASAA sequence, stretches of tripeptides GXZ where X and Z represent bulky residues, and sequences similar to PVIVIEE. No concatenations of GX dipeptide or alanine, which are typical for Bombyx silkworms and Antheraea silk moths, respectively, were found. Despite different repeat structure, the silks of G. mellonella and E. kuehniella exhibit similar tensile strength as the Bombyx and Antheraea silks. We suggest that in these latter two species, variations in the repeat length obstruct repeat alignment, but sufficiently long stretches of iterated residues get superposed to interact. In the pyralid H-fibroins, interactions of the widely separated and diverse motifs depend on the precision of repeat matching; silk is strong in G. mellonella and E. kuehniella, with 2-3 types of long homogeneous repeats, and nearly 10 times weaker in P. interpunctella, with seven types of shorter erratic repeats. The high proportion of large amino acids in the H-fibroin of pyralids has probably evolved in connection with the spinning habit of caterpillars that live in protective silk tubes and spin continuously, enlarging the tubes on one end and partly devouring the other one. The silk serves as a depot of energetically rich and essential amino acids that may be scarce in the diet. PMID:12816957

  19. Amino acid sequence of the nonsecretory ribonuclease of human urine.

    PubMed

    Beintema, J J; Hofsteenge, J; Iwama, M; Morita, T; Ohgi, K; Irie, M; Sugiyama, R H; Schieven, G L; Dekker, C A; Glitz, D G

    1988-06-14

    The amino acid sequence of a nonsecretory ribonuclease isolated from human urine was determined except for the identity of the residue at position 7. Sequence information indicates that the ribonucleases of human liver and spleen and an eosinophil-derived neurotoxin are identical or very closely related gene products. The sequence is identical at about 30% of the amino acid positions with those of all of the secreted mammalian ribonucleases for which information is available. Identical residues include active-site residues histidine-12, histidine-119, and lysine-41, other residues known to be important for substrate binding and catalytic activity, and all eight half-cystine residues common to these enzymes. Major differences include a deletion of six residues in the (so-called) S-peptide loop, insertions of two, and nine residues, respectively, in three other external loops of the molecule, and an addition of three residues at the amino terminus. The sequence shows the human nonsecretory ribonuclease to belong to the same ribonuclease superfamily as the mammalian secretory ribonucleases, turtle pancreatic ribonuclease, and human angiogenin. Sequence data suggest that a gene duplication occurred in an ancient vertebrate ancestor; one branch led to the nonsecretory ribonuclease, while the other branch led to a second duplication, with one line leading to the secretory ribonucleases (in mammals) and the second line leading to pancreatic ribonuclease in turtle and an angiogenic factor in mammals (human angiogenin). The nonsecretory ribonuclease has five short carbohydrate chains attached via asparagine residues at the surface of the molecule; these chains may have been shortened by exoglycosidase action.(ABSTRACT TRUNCATED AT 250 WORDS) PMID:3166997

  20. Inference of Episodic Changes in Natural Selection Acting on Protein Coding Sequences via CODEML.

    PubMed

    Bielawski, Joseph P; Baker, Jennifer L; Mingrone, Joseph

    2016-01-01

    This unit provides protocols for using the CODEML program from the PAML package to make inferences about episodic natural selection in protein-coding sequences. The protocols cover inference tasks such as maximum likelihood estimation of selection intensity, testing the hypothesis of episodic positive selection, and identifying sites with a history of episodic evolution. We provide protocols for using the rich set of models implemented in CODEML to assess robustness, and for using bootstrapping to assess if the requirements for reliable statistical inference have been met. An example dataset is used to illustrate how the protocols are used with real protein-coding sequences. The workflow of this design, through automation, is readily extendable to a larger-scale evolutionary survey. © 2016 by John Wiley & Sons, Inc. PMID:27322407

  1. CodHonEditor: Spreadsheets for Codon Optimization and Editing of Protein Coding Sequences.

    PubMed

    Takai, Kazuyuki

    2016-05-01

    Gene synthesis is getting more important with the growing availability of low-cost commercial services. The coding sequences are often "optimized" as for the relative synonymous codon usage (RSCU) before synthesis, which is generally included in the commercial services. However, the codon optimization processes are different among different providers and are often hidden from the users. Here, the d'Hondt method, which is widely adopted as a method for determining the number of seats for each party in proportional-representation public elections, is applied to RSCU fitting. This allowed me to make a set of electronic spreadsheets for manual design of protein coding sequences for expression in Escherichia coli, with which users can see the process of codon optimization and can manually edit the codons after the automatic optimization. The spreadsheets may also be useful for molecular biology education. PMID:27002987

  2. Characterization and amino acid sequence of a fatty acid-binding protein from human heart.

    PubMed

    Offner, G D; Brecher, P; Sawlivich, W B; Costello, C E; Troxler, R F

    1988-05-15

    The complete amino acid sequence of a fatty acid-binding protein from human heart was determined by automated Edman degradation of CNBr, BNPS-skatole [3'-bromo-3-methyl-2-(2-nitrobenzenesulphenyl)indolenine], hydroxylamine, Staphylococcus aureus V8 proteinase, tryptic and chymotryptic peptides, and by digestion of the protein with carboxypeptidase A. The sequence of the blocked N-terminal tryptic peptide from citraconylated protein was determined by collisionally induced decomposition mass spectrometry. The protein contains 132 amino acid residues, is enriched with respect to threonine and lysine, lacks cysteine, has an acetylated valine residue at the N-terminus, and has an Mr of 14768 and an isoelectric point of 5.25. This protein contains two short internal repeated sequences from residues 48-54 and from residues 114-119 located within regions of predicted beta-structure and decreasing hydrophobicity. These short repeats are contained within two longer repeated regions from residues 48-60 and residues 114-125, which display 62% sequence similarity. These regions could accommodate the charged and uncharged moieties of long-chain fatty acids and may represent fatty acid-binding domains consistent with the finding that human heart fatty acid-binding protein binds 2 mol of oleate or palmitate/mol of protein. Detailed evidence for the amino acid sequences of the peptides has been deposited as Supplementary Publication SUP 50143 (23 pages) at the British Library Lending Division, Boston Spa, Yorkshire LS23 7BQ, U.K., from whom copies may be obtained as indicated in Biochem. J. (1988) 249, 5. PMID:3421901

  3. ANTICALIgN: visualizing, editing and analyzing combined nucleotide and amino acid sequence alignments for combinatorial protein engineering.

    PubMed

    Jarasch, Alexander; Kopp, Melanie; Eggenstein, Evelyn; Richter, Antonia; Gebauer, Michaela; Skerra, Arne

    2016-07-01

    ANTIC ALIGN: is an interactive software developed to simultaneously visualize, analyze and modify alignments of DNA and/or protein sequences that arise during combinatorial protein engineering, design and selection. ANTIC ALIGN: combines powerful functions known from currently available sequence analysis tools with unique features for protein engineering, in particular the possibility to display and manipulate nucleotide sequences and their translated amino acid sequences at the same time. ANTIC ALIGN: offers both template-based multiple sequence alignment (MSA), using the unmutated protein as reference, and conventional global alignment, to compare sequences that share an evolutionary relationship. The application of similarity-based clustering algorithms facilitates the identification of duplicates or of conserved sequence features among a set of selected clones. Imported nucleotide sequences from DNA sequence analysis are automatically translated into the corresponding amino acid sequences and displayed, offering numerous options for selecting reading frames, highlighting of sequence features and graphical layout of the MSA. The MSA complexity can be reduced by hiding the conserved nucleotide and/or amino acid residues, thus putting emphasis on the relevant mutated positions. ANTIC ALIGN: is also able to handle suppressed stop codons or even to incorporate non-natural amino acids into a coding sequence. We demonstrate crucial functions of ANTIC ALIGN: in an example of Anticalins selected from a lipocalin random library against the fibronectin extradomain B (ED-B), an established marker of tumor vasculature. Apart from engineered protein scaffolds, ANTIC ALIGN: provides a powerful tool in the area of antibody engineering and for directed enzyme evolution. PMID:27261456

  4. Cloning and nucleotide sequencing of genes for three small, acid-soluble proteins from Bacillus subtilis spores.

    PubMed Central

    Connors, M J; Mason, J M; Setlow, P

    1986-01-01

    Three Bacillus subtilis genes (termed sspA, sspB, and sspD) which code for small, acid-soluble spore proteins (SASPs) have been cloned, and their complete nucleotide sequence has been determined. The amino acid sequences of the SASPs coded for by these genes are similar to each other and to those of the SASP-1 of B. subtilis (coded for by the sspC gene) and the SASP-A/C family of B. megaterium. The sspA and sspB genes are expressed only in sporulation, in parallel with each other and with the sspC gene. Two regions upstream of the postulated transcription start sites for the sspA and B genes have significant homology with the analogous regions of the sspC gene and the SASP-A/C gene family. Purification of two of the three major B, subtilis SASPs (alpha and beta) and determination of their amino-terminal sequences indicated that the sspA gene codes for SASP-alpha and that the sspB gene codes for SASP-beta. This was confirmed by the introduction of deletion mutations into the cloned sspA and sspB genes and transfer of these deletions into the B. subtilis chromosome with concomitant loss of the wild-type gene. Images PMID:3009398

  5. The nomenclature of 1-aminoalkylphosphonic acids and derivatives: evolution of the code system.

    PubMed

    Drabowicz, Józef; Jakubowski, Hieronim; Kudzin, Marcin H; Kudzin, Zbigniew H

    2015-01-01

    The approach for the unification of published proposals for the nomenclature and abbreviations of aminoalkylphosphonic acids and their derivatives is presented. Their modification was made on the basis of the IUPAC-IUB rules concerning the nomenclature and code system of proteinogenic amino acids. Our present proposal formulates the supplementary code and nomenclature system allowing unambiguous description of phosphonic analogs of proteinogenic amino acids, their analogs, homologs, metabolites, and derivatives including phosphonopeptides. PMID:25730210

  6. Nucleic acid sequence detection using multiplexed oligonucleotide PCR

    DOEpatents

    Nolan, John P.; White, P. Scott

    2006-12-26

    Methods for rapidly detecting single or multiple sequence alleles in a sample nucleic acid are described. Provided are all of the oligonucleotide pairs capable of annealing specifically to a target allele and discriminating among possible sequences thereof, and ligating to each other to form an oligonucleotide complex when a particular sequence feature is present (or, alternatively, absent) in the sample nucleic acid. The design of each oligonucleotide pair permits the subsequent high-level PCR amplification of a specific amplicon when the oligonucleotide complex is formed, but not when the oligonucleotide complex is not formed. The presence or absence of the specific amplicon is used to detect the allele. Detection of the specific amplicon may be achieved using a variety of methods well known in the art, including without limitation, oligonucleotide capture onto DNA chips or microarrays, oligonucleotide capture onto beads or microspheres, electrophoresis, and mass spectrometry. Various labels and address-capture tags may be employed in the amplicon detection step of multiplexed assays, as further described herein.

  7. The amino acid sequence of rabbit muscle triose phosphate isomerase.

    PubMed Central

    Corran, P H; Waley, S G

    1975-01-01

    The amino acid sequence of rabbit muscle triose phosphate isomerase was deduced by characterizing peptides that overlap the tryptic peptides. Thiol groups were modified by oxidation, carboxymethylation or aminoen. About 50 peptides that provided information about overlaps were isolated; the peptides were mostly characterized by their compositions and N-terminal residues. The peptide chains contain 248 amino acid residues, and no evidence for dissimilarity of the two subunits that comprise the native enzyme was found. The sequence of the rabbit muscle enzyme may be compared with that of the coelacanth enzyme (Kolb et al., 1974): 84% of the residues are in identical positions. Similarly, comparison of the sequence with that inferred for the chicken enzyme (Furth et al., 1974) shows that 87% of the residues are in identical positions. Limited though these comparisons are, they suggest that triose phosphate isomerase has one of the lowest rates of evolutionary change. An extended version of the present paper has been deposited as Supplementary Publication SUP 50040 (42 pages) at the British Library (Lending Division) (formerly the National Lending Library for Science and Technology), Boston Spa, Yorks. LS23 7BQ, U.K., from whom copies can be obtained on the terms given in Biochem. J. (1975) 145, 5. PMID:1171682

  8. The amino acid sequence of chymopapain from Carica papaya.

    PubMed Central

    Watson, D C; Yaguchi, M; Lynn, K R

    1990-01-01

    Chymopapain is a polypeptide of 218 amino acid residues. It has considerable structural similarity with papain and papaya proteinase omega, including conservation of the catalytic site and of the disulphide bonding. Chymopapain is like papaya proteinase omega in carrying four extra residues between papain positions 168 and 169, but differs from both papaya proteinases in the composition of its S2 subsite, as well as in having a second thiol group, Cys-117. Some evidence for the amino acid sequence of chymopapain has been deposited as Supplementary Publication SUP 50153 (12 pages) at the British Library Document Supply Centre, Boston Spa., Wetherby, West Yorkshire LS23 7BQ, U.K., from whom copies may be obtained on the terms indicated in Biochem. J. (1990) 265, 5. The information comprises Supplement Tables 1-4, which contain, in order, amino acid compositions of peptides from tryptic, peptic, CNBr and mild acid cleavages, Supplement Fig. 1, showing re-fractionation of selected peaks from Fig. 2 of the main paper. Supplement Fig. 2, showing cation-exchange chromatography of the earliest-eluted peak of Fig. 3 of the main paper, Supplement Fig. 3, showing reverse-phase h.p.l.c. of the later-eluted peak from Fig. 3 of the main paper, and Supplement Fig. 4, showing the separation of peptides after mild acid hydrolysis of CNBr-cleavage fragment CB3. PMID:2106878

  9. The amino acid sequence of chymopapain from Carica papaya.

    PubMed

    Watson, D C; Yaguchi, M; Lynn, K R

    1990-02-15

    Chymopapain is a polypeptide of 218 amino acid residues. It has considerable structural similarity with papain and papaya proteinase omega, including conservation of the catalytic site and of the disulphide bonding. Chymopapain is like papaya proteinase omega in carrying four extra residues between papain positions 168 and 169, but differs from both papaya proteinases in the composition of its S2 subsite, as well as in having a second thiol group, Cys-117. Some evidence for the amino acid sequence of chymopapain has been deposited as Supplementary Publication SUP 50153 (12 pages) at the British Library Document Supply Centre, Boston Spa., Wetherby, West Yorkshire LS23 7BQ, U.K., from whom copies may be obtained on the terms indicated in Biochem. J. (1990) 265, 5. The information comprises Supplement Tables 1-4, which contain, in order, amino acid compositions of peptides from tryptic, peptic, CNBr and mild acid cleavages, Supplement Fig. 1, showing re-fractionation of selected peaks from Fig. 2 of the main paper. Supplement Fig. 2, showing cation-exchange chromatography of the earliest-eluted peak of Fig. 3 of the main paper, Supplement Fig. 3, showing reverse-phase h.p.l.c. of the later-eluted peak from Fig. 3 of the main paper, and Supplement Fig. 4, showing the separation of peptides after mild acid hydrolysis of CNBr-cleavage fragment CB3. PMID:2106878

  10. Cloning and DNA sequence of the gene coding for Bacillus stearothermophilus T-6 xylanase.

    PubMed Central

    Gat, O; Lapidot, A; Alchanati, I; Regueros, C; Shoham, Y

    1994-01-01

    Bacillus stearothermophilus T-6 produces an extracellular thermostable xylanase. Affinity-purified polyclonal serum raised against the enzyme was used to screen a genomic library of B. stearothermophilus T-6 constructed in lambda-EMBL3. Two positive phages were isolated, both containing similar 13-kb inserts, and their lysates exhibited xylanase activity. A 3,696-bp SalI-BamHI fragment containing the xylanase gene was subcloned in Escherichia coli and subsequently sequenced. The open reading frame of xylanase T-6 consists of 1,236 bp. On the basis of sequence similarity, two possible -10 and -35 regions, a ribosome-binding site at the 5' end of the gene and a potential transcriptional termination motif at the 3' end of the gene, were identified. From the previously known N-terminal amino acid sequence of xylanase T-6 and the possible ribosome-binding site, a putative 28-amino-acid signal peptide was deduced. The mature xylanase T-6 consists of 379 amino acids with a calculated molecular weight and pI of 43,808 and 6.88, respectively. Multiple alignment of beta-glycanase amino acid sequences revealed highly conserved regions. Northern (RNA) blot analysis indicated that the xylanase T-6 transcript is about 1.4 kb and that the induction of this enzyme synthesis by xylose is on the transcriptional level. Images PMID:8031084

  11. Nucleic and amino acid sequences relating to a novel transketolase, and methods for the expression thereof

    DOEpatents

    Croteau, Rodney Bruce; Wildung, Mark Raymond; Lange, Bernd Markus; McCaskill, David G.

    2001-01-01

    cDNAs encoding 1-deoxyxylulose-5-phosphate synthase from peppermint (Mentha piperita) have been isolated and sequenced, and the corresponding amino acid sequences have been determined. Accordingly, isolated DNA sequences (SEQ ID NO:3, SEQ ID NO:5, SEQ ID NO:7) are provided which code for the expression of 1-deoxyxylulose-5-phosphate synthase from plants. In another aspect the present invention provides for isolated, recombinant DXPS proteins, such as the proteins having the sequences set forth in SEQ ID NO:4, SEQ ID NO:6 and SEQ ID NO:8. In other aspects, replicable recombinant cloning vehicles are provided which code for plant 1-deoxyxylulose-5-phosphate synthases, or for a base sequence sufficiently complementary to at least a portion of 1-deoxyxylulose-5-phosphate synthase DNA or RNA to enable hybridization therewith. In yet other aspects, modified host cells are provided that have been transformed, transfected, infected and/or injected with a recombinant cloning vehicle and/or DNA sequence encoding a plant 1-deoxyxylulose-5-phosphate synthase. Thus, systems and methods are provided for the recombinant expression of the aforementioned recombinant 1-deoxyxylulose-5-phosphate synthase that may be used to facilitate its production, isolation and purification in significant amounts. Recombinant 1-deoxyxylulose-5-phosphate synthase may be used to obtain expression or enhanced expression of 1-deoxyxylulose-5-phosphate synthase in plants in order to enhance the production of 1-deoxyxylulose-5-phosphate, or its derivatives such as isopentenyl diphosphate (BP), or may be otherwise employed for the regulation or expression of 1-deoxyxylulose-5-phosphate synthase, or the production of its products.

  12. Adenovirus E1A coding sequences that enable ras and pmt oncogenes to transform cultured primary cells.

    PubMed Central

    Zerler, B; Moran, B; Maruyama, K; Moomaw, J; Grodzicker, T; Ruley, H E

    1986-01-01

    Plasmids expressing partial adenovirus early region 1A (E1A) coding sequences were tested for activities which facilitate in vitro establishment (immortalization) of primary baby rat kidney cells and which enable the T24 Harvey ras-related oncogene and the polyomavirus middle T antigen (pmt) gene to transform primary baby rat kidney cells. E1A cDNAs expressing the 289- and 243-amino acid proteins expressed both E1A transforming functions. Mutant hrA, which encodes a 140-amino acid protein derived from the amino-terminal domain shared by the 289- and 243-amino acid proteins, enabled ras (but not pmt) to transform and facilitated in vitro establishment to a low, but detectable, extent. These studies suggest that E1A functions which collaborate with ras oncogenes and those which facilitate establishment are linked. Furthermore, E1A transforming functions are not associated with activities of the 289-amino acid E1A proteins required for efficient transcriptional activation of viral early region promoters. Images PMID:3022137

  13. Genome Sequence Analysis of the Naphthenic Acid Degrading and Metal Resistant Bacterium Cupriavidus gilardii CR3.

    PubMed

    Wang, Xiaoyu; Chen, Meili; Xiao, Jingfa; Hao, Lirui; Crowley, David E; Zhang, Zhewen; Yu, Jun; Huang, Ning; Huo, Mingxin; Wu, Jiayan

    2015-01-01

    Cupriavidus sp. are generally heavy metal tolerant bacteria with the ability to degrade a variety of aromatic hydrocarbon compounds, although the degradation pathways and substrate versatilities remain largely unknown. Here we studied the bacterium Cupriavidus gilardii strain CR3, which was isolated from a natural asphalt deposit, and which was shown to utilize naphthenic acids as a sole carbon source. Genome sequencing of C. gilardii CR3 was carried out to elucidate possible mechanisms for the naphthenic acid biodegradation. The genome of C. gilardii CR3 was composed of two circular chromosomes chr1 and chr2 of respectively 3,539,530 bp and 2,039,213 bp in size. The genome for strain CR3 encoded 4,502 putative protein-coding genes, 59 tRNA genes, and many other non-coding genes. Many genes were associated with xenobiotic biodegradation and metal resistance functions. Pathway prediction for degradation of cyclohexanecarboxylic acid, a representative naphthenic acid, suggested that naphthenic acid undergoes initial ring-cleavage, after which the ring fission products can be degraded via several plausible degradation pathways including a mechanism similar to that used for fatty acid oxidation. The final metabolic products of these pathways are unstable or volatile compounds that were not toxic to CR3. Strain CR3 was also shown to have tolerance to at least 10 heavy metals, which was mainly achieved by self-detoxification through ion efflux, metal-complexation and metal-reduction, and a powerful DNA self-repair mechanism. Our genomic analysis suggests that CR3 is well adapted to survive the harsh environment in natural asphalts containing naphthenic acids and high concentrations of heavy metals. PMID:26301592

  14. Genome Sequence Analysis of the Naphthenic Acid Degrading and Metal Resistant Bacterium Cupriavidus gilardii CR3

    PubMed Central

    Xiao, Jingfa; Hao, Lirui; Crowley, David E.; Zhang, Zhewen; Yu, Jun; Huang, Ning; Huo, Mingxin; Wu, Jiayan

    2015-01-01

    Cupriavidus sp. are generally heavy metal tolerant bacteria with the ability to degrade a variety of aromatic hydrocarbon compounds, although the degradation pathways and substrate versatilities remain largely unknown. Here we studied the bacterium Cupriavidus gilardii strain CR3, which was isolated from a natural asphalt deposit, and which was shown to utilize naphthenic acids as a sole carbon source. Genome sequencing of C. gilardii CR3 was carried out to elucidate possible mechanisms for the naphthenic acid biodegradation. The genome of C. gilardii CR3 was composed of two circular chromosomes chr1 and chr2 of respectively 3,539,530 bp and 2,039,213 bp in size. The genome for strain CR3 encoded 4,502 putative protein-coding genes, 59 tRNA genes, and many other non-coding genes. Many genes were associated with xenobiotic biodegradation and metal resistance functions. Pathway prediction for degradation of cyclohexanecarboxylic acid, a representative naphthenic acid, suggested that naphthenic acid undergoes initial ring-cleavage, after which the ring fission products can be degraded via several plausible degradation pathways including a mechanism similar to that used for fatty acid oxidation. The final metabolic products of these pathways are unstable or volatile compounds that were not toxic to CR3. Strain CR3 was also shown to have tolerance to at least 10 heavy metals, which was mainly achieved by self-detoxification through ion efflux, metal-complexation and metal-reduction, and a powerful DNA self-repair mechanism. Our genomic analysis suggests that CR3 is well adapted to survive the harsh environment in natural asphalts containing naphthenic acids and high concentrations of heavy metals. PMID:26301592

  15. Rapid Quantification of Mutant Fitness in Diverse Bacteria by Sequencing Randomly Bar-Coded Transposons

    PubMed Central

    Wetmore, Kelly M.; Price, Morgan N.; Waters, Robert J.; Lamson, Jacob S.; He, Jennifer; Hoover, Cindi A.; Blow, Matthew J.; Bristow, James; Butland, Gareth

    2015-01-01

    ABSTRACT Transposon mutagenesis with next-generation sequencing (TnSeq) is a powerful approach to annotate gene function in bacteria, but existing protocols for TnSeq require laborious preparation of every sample before sequencing. Thus, the existing protocols are not amenable to the throughput necessary to identify phenotypes and functions for the majority of genes in diverse bacteria. Here, we present a method, random bar code transposon-site sequencing (RB-TnSeq), which increases the throughput of mutant fitness profiling by incorporating random DNA bar codes into Tn5 and mariner transposons and by using bar code sequencing (BarSeq) to assay mutant fitness. RB-TnSeq can be used with any transposon, and TnSeq is performed once per organism instead of once per sample. Each BarSeq assay requires only a simple PCR, and 48 to 96 samples can be sequenced on one lane of an Illumina HiSeq system. We demonstrate the reproducibility and biological significance of RB-TnSeq with Escherichia coli, Phaeobacter inhibens, Pseudomonas stutzeri, Shewanella amazonensis, and Shewanella oneidensis. To demonstrate the increased throughput of RB-TnSeq, we performed 387 successful genome-wide mutant fitness assays representing 130 different bacterium-carbon source combinations and identified 5,196 genes with significant phenotypes across the five bacteria. In P. inhibens, we used our mutant fitness data to identify genes important for the utilization of diverse carbon substrates, including a putative d-mannose isomerase that is required for mannitol catabolism. RB-TnSeq will enable the cost-effective functional annotation of diverse bacteria using mutant fitness profiling. PMID:25968644

  16. Common sequence motifs coding for higher-plant and prokaryotic O-acetylserine (thiol)-lyases: bacterial origin of a chloroplast transit peptide?

    PubMed

    Rolland, N; Job, D; Douce, R

    1993-08-01

    A comparison of the amino acid sequence of O-acetylserine (thiol)-lyase (EC 4.2.99.8) from Escherichia coli and the isoforms of this enzyme found in the cytosolic and chloroplastic compartments of spinach (Spinacia oleracea) leaf cells allows the essential lysine residue involved in the binding of the pyridoxal 5'-phosphate cofactor to be identified. The results of further sequence comparison of cDNAs coding for these proteins are discussed in the frame of the endosymbiotic theory of chloroplast evolution. The results are compatible with a mechanism in which the chloroplast enzyme originated from the cytosolic enzyme and both plant genes originated from a common prokaryotic ancestor. The comparison also suggests that the 5'-non-coding sequence of the bacterial gene was transferred to the plant cell nucleus and that it has been used to create the N-terminal portions of both plant enzymes, and possibly the transit peptide of the chloroplast enzyme. PMID:7916619

  17. Polymorphism and haplotype structure in River Buffalo (Bubalus bubalis) toll-like receptor 5 (TLR5) coding sequence.

    PubMed

    Jones, B C; Womack, J E

    2012-04-01

    Most of the 160 million river buffalo in the world are in Asia where they are used extensively, both as a food source and for draught power. Only recently have investigations begun exploring the buffalo genome for variation that might influence health and productivity of these economically important animals. This paper describes the sequence variability of the toll-like receptor 5 (TLR5) gene, which recognizes bacterial flagellin and is a key player in the immune system. TLR5 is comprised of a single exon that is 2577 bp and codes 858 amino acids. We examined single-nucleotide polymorphisms (SNPs) located within the coding region. Overall, 17 SNPs were discovered, seven of which are non-synonymous. Our study population yielded four different haplotypes. We examined predicted protein domain structure and found that river buffalo, swamp buffalo, and African Forest buffalo shared the same protein domain structure and are more similar to each other than they are to cattle and American bison, which are similar to each other. PolyPhen 2 analysis revealed one amino acid substitution in the river buffalo population with potential functional significance. PMID:22537062

  18. Cloning and sequencing of a gene coding for an actin binding protein of Saccharomyces exiguus.

    PubMed

    Lange, U; Steiner, S; Grolig, F; Wagner, G; Philippsen, P

    1994-03-01

    The actin binding protein Abp1p of the yeast Saccharomyces cervisiae is thought to be involved in the spatial organisation of cell surface growth. It contains a potential actin binding domain and an SH-3 region, a common motif of many signal transduction proteins [1]. We have cloned and sequenced an ABP1 homologous gene of Saccharomyces exiguus, a yeast which is only distantly related to S. cerevisiae. The protein encoded by this gene is slightly larger than the respective S. cerevisiae protein (617 versus 592 amino acids). The two genes are 67.4% identical and the deduced amino acid sequences share an overall identity of 59.8%. The most conserved regions are the 148 N-terminal amino acids containing the potential actin binding site and the 58 C-terminal amino acids including the SH3 domain. In addition, both proteins contain a repeated motif of unknown function which is rich in glutamic acids with the sequence EEEEEEEAPAPSLPSR in the S. exiguus Abp1p. PMID:8110838

  19. Picture quality measurement based on block visibility in discrete cosine transform-coded video sequences

    NASA Astrophysics Data System (ADS)

    Coudoux, Francois-Xavier; Gazalet, Marc G.; Derviaux, Christian; Corlay, Patrick

    2001-04-01

    In this paper, we present a perceptual measures that predicts the visibility of the well-known blocking effect in discrete cosine transform coded image sequences. The main objective of this work is to use the results of the measure derived for adaptive video postprocessing, in order to significantly improve the visual quality of the video decoded sequences at the receiver. The proposed measure is based on a visual model accounting for both the spatial and temporal properties of the human visual system. The input of the visual model is the distorted sequence only. Psycho- visual experiments have been carried out to determine the eye sensitivity to blocking artifacts, by varying a number of visually significant parameters: background level, spatial, and temporal activities in the surrounding image. Results obtained for the measurement of the viability thresholds enable us to estimate the model parameters. The visual model is finally applied to real coded video sequences. The comparison of measurement results with subjective tests shows that proposed perceptual measure has a good correlation with subjective evaluation.

  20. The sequence of rat leukosialin (W3/13 antigen) reveals a molecule with O-linked glycosylation of one third of its extracellular amino acids.

    PubMed Central

    Killeen, N; Barclay, A N; Willis, A C; Williams, A F

    1987-01-01

    Leukosialin is one of the major glycoproteins of thymocytes and T lymphocytes and is notable for a very high content of O-linked carbohydrate structures. The full protein sequence for rat leukosialin as translated from cDNA clones is now reported. The molecule contains 371 amino acids with 224 residues outside the cell, one transmembrane sequence and 124 cytoplasmic residues. Data from the peptide sequence and carbohydrate composition suggest that one in three of the extracellular amino acids may be O-glycosylated with no N-linked glycosylation sites. The cDNA sequence contained a CpG rich region in the 3' coding sequence and a large 3' non-coding region which included tandem repeats of the sequence GGAT. Images Fig. 4. PMID:2965006

  1. A direct sequence spread spectrum code acquisition circuit for wireless sensor networks

    NASA Astrophysics Data System (ADS)

    Ghaisari, Jafar; Ferdosi, Arash

    2011-06-01

    Narrow band (NB), spread spectrum (SS), and ultra wide band (UWB) are three physical layer bandwidth types used in wireless sensor networks (WSN). SS and UWB technologies have many advantages over NB, which make them preferable for WSN. Synchronisation of different nodes in a WSN is an important task that is necessary to improve cooperation and lifetime of nodes. Code acquisition is the main step of a node's time synchronisation. In this article, a pseudo noise code generator and a code acquisition circuit are proposed, designed and tested using direct sequence SS technique. To investigate the properties of the designed circuits, simulations are carried out via Xilinx Foundation Series software in the real mode. The results demonstrate excellent performance of the proposed algorithms and circuits in all realistic conditions. The code acquisition circuit proposed an adaptive testing window for single dwell serial search method. The code acquisition circuit is a clock phase free approach, thus the clock coherency step is cancelled. Moreover, clock phase difference between transmitter and receiver nodes does not mostly affect the acquisition and thus synchronisation time.

  2. Amino acid sequence prerequisites for the formation of cn ions.

    PubMed

    Downard, K M; Biemann, K

    1993-11-01

    Ammo acid sequence prerequisites are described for the formation of c, ions observed in high-energy collision-induced decomposition spectra of peptides. It is shown that the formation of cn ions is promoted by the nature of the amino acid C-terminal to the cleavage site. A propensity for cn cleavage preceding threonine, and to a lesser extent tryptophan, lysine, and serine, is demonstrated where fragmentation is directed N-terminally at these residues. In addition, the nature of the residue N-terminal to the cleavage site is shown to have little effect on cn ion formation. A mechanism for cn ion formation is proposed and its applicability to the results observed is discussed. PMID:24227531

  3. Ultrasensitive nucleic acid sequence detection by single-molecule electrophoresis

    SciTech Connect

    Castro, A; Shera, E.B.

    1996-09-01

    This is the final report of a one-year laboratory-directed research and development project at Los Alamos National Laboratory. There has been considerable interest in the development of very sensitive clinical diagnostic techniques over the last few years. Many pathogenic agents are often present in extremely small concentrations in clinical samples, especially at the initial stages of infection, making their detection very difficult. This project sought to develop a new technique for the detection and accurate quantification of specific bacterial and viral nucleic acid sequences in clinical samples. The scheme involved the use of novel hybridization probes for the detection of nucleic acids combined with our recently developed technique of single-molecule electrophoresis. This project is directly relevant to the DOE`s Defense Programs strategic directions in the area of biological warfare counter-proliferation.

  4. Genomic DNA sequence of a rice gene coding for a pullulanase-type of starch debranching enzyme.

    PubMed

    Francisco, P B; Zhang, Y; Park, S Y; Ogata, N; Yamanouchi, H; Nakamura, Y

    1998-09-01

    A genomic DNA containing a rice (Oryza sativa L., cv. Norin-8) gene coding for a pullulanase-type starch debranching enzyme (EC 3.2.1. 41) was sequenced (EMBL/GenBank/DDBJ accession number AB012915). Along the 15, 248 bp DNA, the pullulanase gene is split into 26 exons. The four pullulanase consensus regions are positioned in the middle portion of the sequence and are separated by long introns and 1-3 exons. Comparison of the rice cv. Norin-8 pullulanase genomic structure with that of barley pullulanase (limit dextrinase) (F. Lok et al., EMBL/GenBank/DDBJ accession number AF022725) indicates that most of the pullulanase exons are highly conserved. Alignment of the nucleotide bases of rice exon 8 with those of barley exon 8-intron 8-exon 9 fragment suggests that the 85 bp internal sequence of rice exon 8 was originally an intron, a possibility further indicated by the absence in barley and spinach (A. Renz et al., EMBL/GenBank/DDBJ accession number X83969) pullulanases of amino acid residues encoded by the 85 bp fragment. PMID:9748665

  5. The untranslated side of hair and skin mammalian pigmentation: Beyond coding sequences.

    PubMed

    Rouzaud, Francois; Oulmouden, Ahmad; Kos, Lidia

    2010-05-01

    For several decades, tremendous advances in studying skin and hair pigmentation of mammals have been made using Mendelian genetics principles. A number of loci and their associated traits have been extensively examined, crossings performed, and phenotypes well documented. Continuously improving PCR techniques allowed the molecular cloning and sequencing of the first pigmentation genes at the end of the 20th century, a period followed by an intense effort to detect and describe polymorphisms in the coding regions and correlate allelic combinations with the observed melanogenic phenotypes. However, a number of phenotypes and biological events could not be elucidated solely by analysis of the coding regions of genes. Messenger RNA isolation, characterization and quantification techniques allowed groups to move ahead and investigate molecular mechanisms whose secrets lay within the noncoding regions of pigmentation genes transcripts such as MC1R, ASIP, or Mitf. The untranslated elements contain specific nucleotidic sequences and structures that dramatically influence the mRNA half-life and processing thus impacting protein translation and melanin production. As we are progressively uncovering the complex processes regulating melanocyte biology, unraveling complete mRNA structures and understanding mechanisms beyond coding regions has become critical. PMID:20222017

  6. MIMO Radar System for Respiratory Monitoring Using Tx and Rx Modulation with M-Sequence Codes

    NASA Astrophysics Data System (ADS)

    Miwa, Takashi; Ogiwara, Shun; Yamakoshi, Yoshiki

    The importance of respiratory monitoring systems during sleep have increased due to early diagnosis of sleep apnea syndrome (SAS) in the home. This paper presents a simple respiratory monitoring system suitable for home use having 3D ranging of targets. The range resolution and azimuth resolution are obtained by a stepped frequency transmitting signal and MIMO arrays with preferred pair M-sequence codes doubly modulating in transmission and reception, respectively. Due to the use of these codes, Gold sequence codes corresponding to all the antenna combinations are equivalently modulated in receiver. The signal to interchannel interference ratio of the reconstructed image is evaluated by numerical simulations. The results of experiments on a developed prototype 3D-MIMO radar system show that this system can extract only the motion of respiration of a human subject 2m apart from a metallic rotatable reflector. Moreover, it is found that this system can successfully measure the respiration information of sleeping human subjects for 96.6 percent of the whole measurement time except for instances of large posture change.

  7. First draft genome sequencing of indole acetic acid producing and plant growth promoting fungus Preussia sp. BSL10.

    PubMed

    Khan, Abdul Latif; Asaf, Sajjad; Khan, Abdur Rahim; Al-Harrasi, Ahmed; Al-Rawahi, Ahmed; Lee, In-Jung

    2016-05-10

    Preussia sp. BSL10, family Sporormiaceae, was actively producing phytohormone (indole-3-acetic acid) and extra-cellular enzymes (phosphatases and glucosidases). The fungus was also promoting the growth of arid-land tree-Boswellia sacra. Looking at such prospects of this fungus, we sequenced its draft genome for the first time. The Illumina based sequence analysis reveals an approximate genome size of 31.4Mbp for Preussia sp. BSL10. Based on ab initio gene prediction, total 32,312 coding sequences were annotated consisting of 11,967 coding genes, pseudogenes, and 221 tRNA genes. Furthermore, 321 carbohydrate-active enzymes were predicted and classified into many functional families. PMID:26995610

  8. Recurrent Coding Sequence Variation Explains Only A Small Fraction of the Genetic Architecture of Colorectal Cancer

    PubMed Central

    Timofeeva, Maria N.; Kinnersley, Ben; Farrington, Susan M.; Whiffin, Nicola; Palles, Claire; Svinti, Victoria; Lloyd, Amy; Gorman, Maggie; Ooi, Li-Yin; Hosking, Fay; Barclay, Ella; Zgaga, Lina; Dobbins, Sara; Martin, Lynn; Theodoratou, Evropi; Broderick, Peter; Tenesa, Albert; Smillie, Claire; Grimes, Graeme; Hayward, Caroline; Campbell, Archie; Porteous, David; Deary, Ian J.; Harris, Sarah E.; Northwood, Emma L.; Barrett, Jennifer H.; Smith, Gillian; Wolf, Roland; Forman, David; Morreau, Hans; Ruano, Dina; Tops, Carli; Wijnen, Juul; Schrumpf, Melanie; Boot, Arnoud; Vasen, Hans F A; Hes, Frederik J.; van Wezel, Tom; Franke, Andre; Lieb, Wolgang; Schafmayer, Clemens; Hampe, Jochen; Buch, Stephan; Propping, Peter; Hemminki, Kari; Försti, Asta; Westers, Helga; Hofstra, Robert; Pinheiro, Manuela; Pinto, Carla; Teixeira, Manuel; Ruiz-Ponte, Clara; Fernández-Rozadilla, Ceres; Carracedo, Angel; Castells, Antoni; Castellví-Bel, Sergi; Campbell, Harry; Bishop, D. Timothy; Tomlinson, Ian P M; Dunlop, Malcolm G.; Houlston, Richard S.

    2015-01-01

    Whilst common genetic variation in many non-coding genomic regulatory regions are known to impart risk of colorectal cancer (CRC), much of the heritability of CRC remains unexplained. To examine the role of recurrent coding sequence variation in CRC aetiology, we genotyped 12,638 CRCs cases and 29,045 controls from six European populations. Single-variant analysis identified a coding variant (rs3184504) in SH2B3 (12q24) associated with CRC risk (OR = 1.08, P = 3.9 × 10−7), and novel damaging coding variants in 3 genes previously tagged by GWAS efforts; rs16888728 (8q24) in UTP23 (OR = 1.15, P = 1.4 × 10−7); rs6580742 and rs12303082 (12q13) in FAM186A (OR = 1.11, P = 1.2 × 10−7 and OR = 1.09, P = 7.4 × 10−8); rs1129406 (12q13) in ATF1 (OR = 1.11, P = 8.3 × 10−9), all reaching exome-wide significance levels. Gene based tests identified associations between CRC and PCDHGA genes (P < 2.90 × 10−6). We found an excess of rare, damaging variants in base-excision (P = 2.4 × 10−4) and DNA mismatch repair genes (P = 6.1 × 10−4) consistent with a recessive mode of inheritance. This study comprehensively explores the contribution of coding sequence variation to CRC risk, identifying associations with coding variation in 4 genes and PCDHG gene cluster and several candidate recessive alleles. However, these findings suggest that recurrent, low-frequency coding variants account for a minority of the unexplained heritability of CRC. PMID:26553438

  9. Characterization and differential expression analysis of artichoke phenylalanine ammonia-lyase-coding sequences.

    PubMed

    De Paolis, Angelo; Pignone, Domenico; Morgese, Anita; Sonnante, Gabriella

    2008-01-01

    Sequences encoding phenylalanine ammonia-lyase were isolated from artichoke, by using a sequence homology strategy, by screening a genomic library and by 3'-rapid amplification of cDNA end (RACE) technology. These analyses and Southern blots suggested that, in artichoke, phenylalanine ammonia-lyase (PAL) is encoded by a small gene family. The sequences isolated from genomic DNA possess two exons and one intron at the conserved position as in most plant pal characterized to date. The 3'-RACE analysis also indicated that each member of the artichoke pal gene family was present as a pool of transcripts, different in the length of 3'-untranslated region. The deduced amino acid sequences were highly similar to those of PAL from lettuce and sunflower. One of the artichoke pal genes was completely sequenced, and its 5' upstream region contained TATA, CAAT box and cis regulatory elements identified in other phenylpropanoid pathway genes as playing a role in UV and elicitor induction. The expression of three of the identified artichoke pal sequences was evaluated in different plant parts, in developmental stages and after wounding, using gene-specific primers/probe combinations in real-time polymerase chain reaction assays. The three putative genes were differentially expressed in the plant parts analysed and were developmentally regulated. Moreover, after leaf mechanical injury, all of them were differentially regulated. The possible involvement of the single pal genes in different physiological processes is discussed. PMID:18251868

  10. Not an inside job: non-coded amino acids compromise the genetic code

    PubMed Central

    Ribas de Pouplana, Lluís

    2014-01-01

    The sophistication of the editing mechanisms that prevent gene translation errors indicates that amino acid misincorporation is generally a problem to be avoided. Mistranslation is considered invariably deleterious and often caused by confusion between similar proteogenic amino acids. These views are being challenged. The evidence linking misincorporation of dietary non-proteogenic amino acids to human disease continues to grow, and a report in this issue of The EMBO Journal demonstrates the importance of preventing non-proteogenic amino acid misincorporation for cellular homeostasis (Cvetesic et al, 2014). PMID:24952895

  11. [Role of non-coding regulatory ribonucleic acids in chronic inflammatory diseases].

    PubMed

    Heinz, G A; Mashreghi, M-F

    2016-05-01

    Non-coding regulatory ribonucleic acids (RNA), including microRNA, long non-coding RNA and circular RNA, can influence the expression of genes mediating inflammatory processes and therefore affect the course and progression of chronic inflammatory diseases. Recent studies using antisense oligonucleotides suggest that such non-coding regulatory RNAs are suitable as novel therapeutic target molecules for the treatment of inflammatory rheumatic diseases. PMID:27115697

  12. Molecular cloning and sequence analysis of the gene coding for the 57-kDa major soluble antigen of the salmonid fish pathogen Renibacterium salmoninarum.

    PubMed

    Chien, M S; Gilbert, T L; Huang, C; Landolt, M L; O'Hara, P J; Winton, J R

    1992-09-15

    The complete sequence coding for the 57-kDa major soluble antigen of the salmonid fish pathogen, Renibacterium salmoninarum, was determined. The gene contained an opening reading frame of 1671 nucleotides coding for a protein of 557 amino acids with a calculated M(r) value of 57,190. The first 26 amino acids constituted a signal peptide. The deduced sequence for amino acid residues 27-61 was in agreement with the 35 N-terminal amino acid residues determined by microsequencing, suggesting the protein is synthesized as a 557-amino acid precursor and processed to produce a mature protein of M(r) 54,505. Two regions of the protein contained imperfect direct repeats. The first region contained two copies of an 81-residue repeat, the second contained five copies of an unrelated 25-residue repeat. Also, a perfect inverted repeat (including three in-frame UAA stop codons) was observed at the carboxyl-terminus of the gene. PMID:1383085

  13. Cloning and sequence analysis of the coding sequence of β-actin cDNA from the Chinese alligator and suitable internal reference primers from the β-actin gene.

    PubMed

    Zhu, H N; Zhang, S Z; Zhou, Y K; Wang, C L; Wu, X B

    2015-01-01

    β-Actin is an essential component of the cytoskeleton and is stably expressed in various tissues of animals, thus, it is commonly used as an internal reference for gene expression studies. In this study, a 1731-bp fragment of β-actin cDNA from Alligator sinensis was obtained using the homology cloning technique. Sequence analysis showed that this fragment contained the complete coding sequence of the β-actin gene (1128 bp), encoding 375 amino acids. The amino acid sequence of β-actin is highly conserved and its nucleotide sequence is slightly variable. Multiple alignment analyses showed that the nucleotide sequence of the β-actin gene from A. sinensis is very similar to sequences from birds, with 94-95% identity. Ten pairs of primers with different product sizes and different annealing temperatures were screened by PCR amplification, agarose gel electrophoresis, and DNA sequencing, and could be used as internal reference primers in gene expression studies. This study expands our knowledge of β-actin gene phylogenetic evolution and provides a basis for quantitative gene expression studies in A. sinensis. PMID:26505364

  14. Evolutionary and sequence-based relationships in bacterial AdoMet-dependent non-coding RNA methyltransferases

    PubMed Central

    2014-01-01

    Background RNA post-transcriptional modification is an exciting field of research that has evidenced this editing process as a sophisticated epigenetic mechanism to fine tune the ribosome function and to control gene expression. Although tRNA modifications seem to be more relevant for the ribosome function and cell physiology as a whole, some rRNA modifications have also been seen to play pivotal roles, essentially those located in central ribosome regions. RNA methylation at nucleobases and ribose moieties of nucleotides appear to frequently modulate its chemistry and structure. RNA methyltransferases comprise a superfamily of highly specialized enzymes that accomplish a wide variety of modifications. These enzymes exhibit a poor degree of sequence similarity in spite of using a common reaction cofactor and modifying the same substrate type. Results Relationships and lineages of RNA methyltransferases have been extensively discussed, but no consensus has been reached. To shed light on this topic, we performed amino acid and codon-based sequence analyses to determine phylogenetic relationships and molecular evolution. We found that most Class I RNA MTases are evolutionarily related to protein and cofactor/vitamin biosynthesis methyltransferases. Additionally, we found that at least nine lineages explain the diversity of RNA MTases. We evidenced that RNA methyltransferases have high content of polar and positively charged amino acid, which coincides with the electrochemistry of their substrates. Conclusions After studying almost 12,000 bacterial genomes and 2,000 patho-pangenomes, we revealed that molecular evolution of Class I methyltransferases matches the different rates of synonymous and non-synonymous substitutions along the coding region. Consequently, evolution on Class I methyltransferases selects against amino acid changes affecting the structure conformation. PMID:25012753

  15. Detection of almond allergen coding sequences in processed foods by real time PCR.

    PubMed

    Prieto, Nuria; Iniesto, Elisa; Burbano, Carmen; Cabanillas, Beatriz; Pedrosa, Mercedes M; Rovira, Mercè; Rodríguez, Julia; Muzquiz, Mercedes; Crespo, Jesus F; Cuadrado, Carmen; Linacero, Rosario

    2014-06-18

    The aim of this work was to develop and analytically validate a quantitative RT-PCR method, using novel primer sets designed on Pru du 1, Pru du 3, Pru du 4, and Pru du 6 allergen-coding sequences, and contrast the sensitivity and specificity of these probes. The temperature and/or pressure processing influence on the ability to detect these almond allergen targets was also analyzed. All primers allowed a specific and accurate amplification of these sequences. The specificity was assessed by amplifying DNA from almond, different Prunus species and other common plant food ingredients. The detection limit was 1 ppm in unprocessed almond kernels. The method's robustness and sensitivity were confirmed using spiked samples. Thermal treatment under pressure (autoclave) reduced yield and amplificability of almond DNA; however, high-hydrostatic pressure treatments did not produced such effects. Compared with ELISA assay outcomes, this RT-PCR showed higher sensitivity to detect almond traces in commercial foodstuffs. PMID:24857239

  16. A versatile palindromic amphipathic repeat coding sequence horizontally distributed among diverse bacterial and eucaryotic microbes

    PubMed Central

    2010-01-01

    HGT and intra-genomic shuffling. Conclusions We describe novel features of PARCELs (Palindromic Amphipathic Repeat Coding ELements), a set of widely distributed repeat protein domains and coding sequences that were likely acquired through HGT by diverse unicellular microbes, further mobilized and diversified within genomes, and co-opted for expression in the membrane proteome of some taxa. Disseminated by multiple gene-centric vehicles, ORFs harboring these elements enhance accessory gene pools as part of the "mobilome" connecting genomes of various clades, in taxa sharing common niches. PMID:20626840

  17. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... in the sequence. (4) The enumeration of amino acids may start at the first amino acid of the first..., counting backwards starting with the amino acid next to number 1. Otherwise, the enumeration of amino acids... sequence every 5 amino acids. The enumeration method for amino acid sequences that is set forth......

  18. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... in the sequence. (4) The enumeration of amino acids may start at the first amino acid of the first..., counting backwards starting with the amino acid next to number 1. Otherwise, the enumeration of amino acids... sequence every 5 amino acids. The enumeration method for amino acid sequences that is set forth......

  19. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... in the sequence. (4) The enumeration of amino acids may start at the first amino acid of the first..., counting backwards starting with the amino acid next to number 1. Otherwise, the enumeration of amino acids... sequence every 5 amino acids. The enumeration method for amino acid sequences that is set forth......

  20. Whole-Exome Sequencing Identifies Rare and Low-Frequency Coding Variants Associated with LDL Cholesterol

    PubMed Central

    Lange, Leslie A.; Hu, Youna; Zhang, He; Xue, Chenyi; Schmidt, Ellen M.; Tang, Zheng-Zheng; Bizon, Chris; Lange, Ethan M.; Smith, Joshua D.; Turner, Emily H.; Jun, Goo; Kang, Hyun Min; Peloso, Gina; Auer, Paul; Li, Kuo-ping; Flannick, Jason; Zhang, Ji; Fuchsberger, Christian; Gaulton, Kyle; Lindgren, Cecilia; Locke, Adam; Manning, Alisa; Sim, Xueling; Rivas, Manuel A.; Holmen, Oddgeir L.; Gottesman, Omri; Lu, Yingchang; Ruderfer, Douglas; Stahl, Eli A.; Duan, Qing; Li, Yun; Durda, Peter; Jiao, Shuo; Isaacs, Aaron; Hofman, Albert; Bis, Joshua C.; Correa, Adolfo; Griswold, Michael E.; Jakobsdottir, Johanna; Smith, Albert V.; Schreiner, Pamela J.; Feitosa, Mary F.; Zhang, Qunyuan; Huffman, Jennifer E.; Crosby, Jacy; Wassel, Christina L.; Do, Ron; Franceschini, Nora; Martin, Lisa W.; Robinson, Jennifer G.; Assimes, Themistocles L.; Crosslin, David R.; Rosenthal, Elisabeth A.; Tsai, Michael; Rieder, Mark J.; Farlow, Deborah N.; Folsom, Aaron R.; Lumley, Thomas; Fox, Ervin R.; Carlson, Christopher S.; Peters, Ulrike; Jackson, Rebecca D.; van Duijn, Cornelia M.; Uitterlinden, André G.; Levy, Daniel; Rotter, Jerome I.; Taylor, Herman A.; Gudnason, Vilmundur; Siscovick, David S.; Fornage, Myriam; Borecki, Ingrid B.; Hayward, Caroline; Rudan, Igor; Chen, Y. Eugene; Bottinger, Erwin P.; Loos, Ruth J.F.; Sætrom, Pål; Hveem, Kristian; Boehnke, Michael; Groop, Leif; McCarthy, Mark; Meitinger, Thomas; Ballantyne, Christie M.; Gabriel, Stacey B.; O’Donnell, Christopher J.; Post, Wendy S.; North, Kari E.; Reiner, Alexander P.; Boerwinkle, Eric; Psaty, Bruce M.; Altshuler, David; Kathiresan, Sekar; Lin, Dan-Yu; Jarvik, Gail P.; Cupples, L. Adrienne; Kooperberg, Charles; Wilson, James G.; Nickerson, Deborah A.; Abecasis, Goncalo R.; Rich, Stephen S.; Tracy, Russell P.; Willer, Cristen J.; Gabriel, Stacey B.; Altshuler, David M.; Abecasis, Gonçalo R.; Allayee, Hooman; Cresci, Sharon; Daly, Mark J.; de Bakker, Paul I.W.; DePristo, Mark A.; Do, Ron; Donnelly, Peter; Farlow, Deborah N.; Fennell, Tim; Garimella, Kiran; Hazen, Stanley L.; Hu, Youna; Jordan, Daniel M.; Jun, Goo; Kathiresan, Sekar; Kang, Hyun Min; Kiezun, Adam; Lettre, Guillaume; Li, Bingshan; Li, Mingyao; Newton-Cheh, Christopher H.; Padmanabhan, Sandosh; Peloso, Gina; Pulit, Sara; Rader, Daniel J.; Reich, David; Reilly, Muredach P.; Rivas, Manuel A.; Schwartz, Steve; Scott, Laura; Siscovick, David S.; Spertus, John A.; Stitziel, Nathaniel O.; Stoletzki, Nina; Sunyaev, Shamil R.; Voight, Benjamin F.; Willer, Cristen J.; Rich, Stephen S.; Akylbekova, Ermeg; Atwood, Larry D.; Ballantyne, Christie M.; Barbalic, Maja; Barr, R. Graham; Benjamin, Emelia J.; Bis, Joshua; Boerwinkle, Eric; Bowden, Donald W.; Brody, Jennifer; Budoff, Matthew; Burke, Greg; Buxbaum, Sarah; Carr, Jeff; Chen, Donna T.; Chen, Ida Y.; Chen, Wei-Min; Concannon, Pat; Crosby, Jacy; Cupples, L. Adrienne; D’Agostino, Ralph; DeStefano, Anita L.; Dreisbach, Albert; Dupuis, Josée; Durda, J. Peter; Ellis, Jaclyn; Folsom, Aaron R.; Fornage, Myriam; Fox, Caroline S.; Fox, Ervin; Funari, Vincent; Ganesh, Santhi K.; Gardin, Julius; Goff, David; Gordon, Ora; Grody, Wayne; Gross, Myron; Guo, Xiuqing; Hall, Ira M.; Heard-Costa, Nancy L.; Heckbert, Susan R.; Heintz, Nicholas; Herrington, David M.; Hickson, DeMarc; Huang, Jie; Hwang, Shih-Jen; Jacobs, David R.; Jenny, Nancy S.; Johnson, Andrew D.; Johnson, Craig W.; Kawut, Steven; Kronmal, Richard; Kurz, Raluca; Lange, Ethan M.; Lange, Leslie A.; Larson, Martin G.; Lawson, Mark; Lewis, Cora E.; Levy, Daniel; Li, Dalin; Lin, Honghuang; Liu, Chunyu; Liu, Jiankang; Liu, Kiang; Liu, Xiaoming; Liu, Yongmei; Longstreth, William T.; Loria, Cay; Lumley, Thomas; Lunetta, Kathryn; Mackey, Aaron J.; Mackey, Rachel; Manichaikul, Ani; Maxwell, Taylor; McKnight, Barbara; Meigs, James B.; Morrison, Alanna C.; Musani, Solomon K.; Mychaleckyj, Josyf C.; Nettleton, Jennifer A.; North, Kari; O’Donnell, Christopher J.; O’Leary, Daniel; Ong, Frank; Palmas, Walter

    2014-01-01

    Elevated low-density lipoprotein cholesterol (LDL-C) is a treatable, heritable risk factor for cardiovascular disease. Genome-wide association studies (GWASs) have identified 157 variants associated with lipid levels but are not well suited to assess the impact of rare and low-frequency variants. To determine whether rare or low-frequency coding variants are associated with LDL-C, we exome sequenced 2,005 individuals, including 554 individuals selected for extreme LDL-C (>98th or <2nd percentile). Follow-up analyses included sequencing of 1,302 additional individuals and genotype-based analysis of 52,221 individuals. We observed significant evidence of association between LDL-C and the burden of rare or low-frequency variants in PNPLA5, encoding a phospholipase-domain-containing protein, and both known and previously unidentified variants in PCSK9, LDLR and APOB, three known lipid-related genes. The effect sizes for the burden of rare variants for each associated gene were substantially higher than those observed for individual SNPs identified from GWASs. We replicated the PNPLA5 signal in an independent large-scale sequencing study of 2,084 individuals. In conclusion, this large whole-exome-sequencing study for LDL-C identified a gene not known to be implicated in LDL-C and provides unique insight into the design and analysis of similar experiments. PMID:24507775

  1. Whole-exome sequencing identifies rare and low-frequency coding variants associated with LDL cholesterol.

    PubMed

    Lange, Leslie A; Hu, Youna; Zhang, He; Xue, Chenyi; Schmidt, Ellen M; Tang, Zheng-Zheng; Bizon, Chris; Lange, Ethan M; Smith, Joshua D; Turner, Emily H; Jun, Goo; Kang, Hyun Min; Peloso, Gina; Auer, Paul; Li, Kuo-Ping; Flannick, Jason; Zhang, Ji; Fuchsberger, Christian; Gaulton, Kyle; Lindgren, Cecilia; Locke, Adam; Manning, Alisa; Sim, Xueling; Rivas, Manuel A; Holmen, Oddgeir L; Gottesman, Omri; Lu, Yingchang; Ruderfer, Douglas; Stahl, Eli A; Duan, Qing; Li, Yun; Durda, Peter; Jiao, Shuo; Isaacs, Aaron; Hofman, Albert; Bis, Joshua C; Correa, Adolfo; Griswold, Michael E; Jakobsdottir, Johanna; Smith, Albert V; Schreiner, Pamela J; Feitosa, Mary F; Zhang, Qunyuan; Huffman, Jennifer E; Crosby, Jacy; Wassel, Christina L; Do, Ron; Franceschini, Nora; Martin, Lisa W; Robinson, Jennifer G; Assimes, Themistocles L; Crosslin, David R; Rosenthal, Elisabeth A; Tsai, Michael; Rieder, Mark J; Farlow, Deborah N; Folsom, Aaron R; Lumley, Thomas; Fox, Ervin R; Carlson, Christopher S; Peters, Ulrike; Jackson, Rebecca D; van Duijn, Cornelia M; Uitterlinden, André G; Levy, Daniel; Rotter, Jerome I; Taylor, Herman A; Gudnason, Vilmundur; Siscovick, David S; Fornage, Myriam; Borecki, Ingrid B; Hayward, Caroline; Rudan, Igor; Chen, Y Eugene; Bottinger, Erwin P; Loos, Ruth J F; Sætrom, Pål; Hveem, Kristian; Boehnke, Michael; Groop, Leif; McCarthy, Mark; Meitinger, Thomas; Ballantyne, Christie M; Gabriel, Stacey B; O'Donnell, Christopher J; Post, Wendy S; North, Kari E; Reiner, Alexander P; Boerwinkle, Eric; Psaty, Bruce M; Altshuler, David; Kathiresan, Sekar; Lin, Dan-Yu; Jarvik, Gail P; Cupples, L Adrienne; Kooperberg, Charles; Wilson, James G; Nickerson, Deborah A; Abecasis, Goncalo R; Rich, Stephen S; Tracy, Russell P; Willer, Cristen J

    2014-02-01

    Elevated low-density lipoprotein cholesterol (LDL-C) is a treatable, heritable risk factor for cardiovascular disease. Genome-wide association studies (GWASs) have identified 157 variants associated with lipid levels but are not well suited to assess the impact of rare and low-frequency variants. To determine whether rare or low-frequency coding variants are associated with LDL-C, we exome sequenced 2,005 individuals, including 554 individuals selected for extreme LDL-C (>98(th) or <2(nd) percentile). Follow-up analyses included sequencing of 1,302 additional individuals and genotype-based analysis of 52,221 individuals. We observed significant evidence of association between LDL-C and the burden of rare or low-frequency variants in PNPLA5, encoding a phospholipase-domain-containing protein, and both known and previously unidentified variants in PCSK9, LDLR and APOB, three known lipid-related genes. The effect sizes for the burden of rare variants for each associated gene were substantially higher than those observed for individual SNPs identified from GWASs. We replicated the PNPLA5 signal in an independent large-scale sequencing study of 2,084 individuals. In conclusion, this large whole-exome-sequencing study for LDL-C identified a gene not known to be implicated in LDL-C and provides unique insight into the design and analysis of similar experiments. PMID:24507775

  2. A Full-Genomic Sequence-Verified Protein-Coding Gene Collection for Francisella tularensis

    PubMed Central

    Murthy, Tal; Rolfs, Andreas; Hu, Yanhui; Shi, Zhenwei; Raphael, Jacob; Moreira, Donna; Kelley, Fontina; McCarron, Seamus; Jepson, Daniel; Taycher, Elena; Zuo, Dongmei; Mohr, Stephanie E.; Fernandez, Mauricio; Brizuela, Leonardo; LaBaer, Joshua

    2007-01-01

    The rapid development of new technologies for the high throughput (HT) study of proteins has increased the demand for comprehensive plasmid clone resources that support protein expression. These clones must be full-length, sequence-verified and in a flexible format. The generation of these resources requires automated pipelines supported by software management systems. Although the availability of clone resources is growing, current collections are either not complete or not fully sequence-verified. We report an automated pipeline, supported by several software applications that enabled the construction of the first comprehensive sequence-verified plasmid clone resource for more than 96% of protein coding sequences of the genome of F. tularensis, a highly virulent human pathogen and the causative agent of tularemia. This clone resource was applied to a HT protein purification pipeline successfully producing recombinant proteins for 72% of the genes. These methods and resources represent significant technological steps towards exploiting the genomic information of F. tularensis in discovery applications. PMID:17593976

  3. Predicting protein disorder by analyzing amino acid sequence

    PubMed Central

    Yang, Jack Y; Yang, Mary Qu

    2008-01-01

    Background Many protein regions and some entire proteins have no definite tertiary structure, presenting instead as dynamic, disorder ensembles under different physiochemical circumstances. These proteins and regions are known as Intrinsically Unstructured Proteins (IUP). IUP have been associated with a wide range of protein functions, along with roles in diseases characterized by protein misfolding and aggregation. Results Identifying IUP is important task in structural and functional genomics. We exact useful features from sequences and develop machine learning algorithms for the above task. We compare our IUP predictor with PONDRs (mainly neural-network-based predictors), disEMBL (also based on neural networks) and Globplot (based on disorder propensity). Conclusion We find that augmenting features derived from physiochemical properties of amino acids (such as hydrophobicity, complexity etc.) and using ensemble method proved beneficial. The IUP predictor is a viable alternative software tool for identifying IUP protein regions and proteins. PMID:18831799

  4. Major Breeding Plumage Color Differences of Male Ruffs (Philomachus pugnax) Are Not Associated With Coding Sequence Variation in the MC1R Gene

    PubMed Central

    Küpper, Clemens; Burke, Terry; Lank, David B.

    2015-01-01

    Sequence variation in the melanocortin-1 receptor (MC1R) gene explains color morph variation in several species of birds and mammals. Ruffs (Philomachus pugnax) exhibit major dark/light color differences in melanin-based male breeding plumage which is closely associated with alternative reproductive behavior. A previous study identified a microsatellite marker (Ppu020) near the MC1R locus associated with the presence/absence of ornamental plumage. We investigated whether coding sequence variation in the MC1R gene explains major dark/light plumage color variation and/or the presence/absence of ornamental plumage in ruffs. Among 821bp of the MC1R coding region from 44 male ruffs we found 3 single nucleotide polymorphisms, representing 1 nonsynonymous and 2 synonymous amino acid substitutions. None were associated with major dark/light color differences or the presence/absence of ornamental plumage. At all amino acid sites known to be functionally important in other avian species with dark/light plumage color variation, ruffs were either monomorphic or the shared polymorphism did not coincide with color morph. Neither ornamental plumage color differences nor the presence/absence of ornamental plumage in ruffs are likely to be caused entirely by amino acid variation within the coding regions of the MC1R locus. Regulatory elements and structural variation at other loci may be involved in melanin expression and contribute to the extreme plumage polymorphism observed in this species. PMID:25534935

  5. The amino acid alphabet and the architecture of the protein sequence-structure map. I. Binary alphabets.

    PubMed

    Ferrada, Evandro

    2014-12-01

    The correspondence between protein sequences and structures, or sequence-structure map, relates to fundamental aspects of structural, evolutionary and synthetic biology. The specifics of the mapping, such as the fraction of accessible sequences and structures, or the sequences' ability to fold fast, are dictated by the type of interactions between the monomers that compose the sequences. The set of possible interactions between monomers is encapsulated by the potential energy function. In this study, I explore the impact of the relative forces of the potential on the architecture of the sequence-structure map. My observations rely on simple exact models of proteins and random samples of the space of potential energy functions of binary alphabets. I adopt a graph perspective and study the distribution of viable sequences and the structures they produce, as networks of sequences connected by point mutations. I observe that the relative proportion of attractive, neutral and repulsive forces defines types of potentials, that induce sequence-structure maps of vastly different architectures. I characterize the properties underlying these differences and relate them to the structure of the potential. Among these properties are the expected number and relative distribution of sequences associated to specific structures and the diversity of structures as a function of sequence divergence. I study the types of binary potentials observed in natural amino acids and show that there is a strong bias towards only some types of potentials, a bias that seems to characterize the folding code of natural proteins. I discuss implications of these observations for the architecture of the sequence-structure map of natural proteins, the construction of random libraries of peptides, and the early evolution of the natural amino acid alphabet. PMID:25473967

  6. The Amino Acid Alphabet and the Architecture of the Protein Sequence-Structure Map. I. Binary Alphabets

    PubMed Central

    Ferrada, Evandro

    2014-01-01

    The correspondence between protein sequences and structures, or sequence-structure map, relates to fundamental aspects of structural, evolutionary and synthetic biology. The specifics of the mapping, such as the fraction of accessible sequences and structures, or the sequences' ability to fold fast, are dictated by the type of interactions between the monomers that compose the sequences. The set of possible interactions between monomers is encapsulated by the potential energy function. In this study, I explore the impact of the relative forces of the potential on the architecture of the sequence-structure map. My observations rely on simple exact models of proteins and random samples of the space of potential energy functions of binary alphabets. I adopt a graph perspective and study the distribution of viable sequences and the structures they produce, as networks of sequences connected by point mutations. I observe that the relative proportion of attractive, neutral and repulsive forces defines types of potentials, that induce sequence-structure maps of vastly different architectures. I characterize the properties underlying these differences and relate them to the structure of the potential. Among these properties are the expected number and relative distribution of sequences associated to specific structures and the diversity of structures as a function of sequence divergence. I study the types of binary potentials observed in natural amino acids and show that there is a strong bias towards only some types of potentials, a bias that seems to characterize the folding code of natural proteins. I discuss implications of these observations for the architecture of the sequence-structure map of natural proteins, the construction of random libraries of peptides, and the early evolution of the natural amino acid alphabet. PMID:25473967

  7. Complete coding sequence and molecular epidemiological analysis of Sindbis virus isolates from mosquitoes and humans, Finland.

    PubMed

    Sane, Jussi; Kurkela, Satu; Putkuri, Niina; Huhtamo, Eili; Vaheri, Antti; Vapalahti, Olli

    2012-09-01

    Sindbis virus (SINV) is an arthropod-borne alphavirus, which causes rash-arthritis, particularly in Finland. SINV is transmitted by mosquitoes in Finland but thus far no virus has been isolated from mosquitoes. In this study, we report the isolation of the first SINV strain from mosquitoes in Finland and its full-length protein-coding sequence. We furthermore describe the full-length coding sequence of six SINV strains previously isolated from humans in Finland and from a mosquito in Russia. The strain isolated from mosquitoes (Ilomantsi-2005M) was very closely related to all the other Northern European SINV strains. We found 9 aa positions, of which five in the nsP3 protein C terminus, to be distinctive signatures for the Northern European strains that may be associated with vector or host species adaptation. Phylogenetic analyses further indicate that SINV has a local circulation in endemic regions in Northern Europe and no novel strains are frequently being introduced. PMID:22647374

  8. Comparative Sequence Analysis of the Non-Protein-Coding Mitochondrial DNA of Inbred Rat Strains

    PubMed Central

    Abhyankar, Avinash; Park, Hee-Bok; Tonolo, Giancarlo; Luthman, Holger

    2009-01-01

    The proper function of mammalian mitochondria necessitates a coordinated expression of both nuclear and mitochondrial genes, most likely due to the co-evolution of nuclear and mitochondrial genomes. The non-protein coding regions of mitochondrial DNA (mtDNA) including the D-loop, tRNA and rRNA genes form a major component of this regulated expression unit. Here we present comparative analyses of the non-protein-coding regions from 27 Rattus norvegicus mtDNA sequences. There were two variable positions in 12S rRNA, 20 in 16S rRNA, eight within the tRNA genes and 13 in the D-loop. Only one of the three neutrality tests used demonstrated statistically significant evidence for selection in 16S rRNA and tRNA-Cys. Based on our analyses of conserved sequences, we propose that some of the variable nucleotide positions identified in 16S rRNA and tRNA-Cys, and the D-loop might be important for mitochondrial function and its regulation. PMID:19997590

  9. Cloning of human transketolase cDNAs and comparison of the nucleotide sequence of the coding region in Wernicke-Korsakoff and non-Wernicke-Korsakoff individuals.

    PubMed

    McCool, B A; Plonk, S G; Martin, P R; Singleton, C K

    1993-01-15

    Variants of the enzyme transketolase which possess reduced affinity for its cofactor thiamine pyrophosphate (high apparent Km) have been described in chronic alcoholic patients with Wernicke-Korsakoff syndrome. Since the syndrome has been shown to be directly related to thiamine deficiency, it has been hypothesized that such transketolase variants may represent a genetic predisposition to the development of this syndrome. To test this hypothesis, human transketolase cDNA clones were isolated, and their nucleotide and predicted amino acid sequence were determined. Transketolase was found to be a single copy gene which produces a single mRNA of approximately 2100 nucleotides. Additionally, the nucleotide sequence of the transketolase coding region in fibroblasts derived from two Wernicke-Korsakoff (WK) patients was compared to that of two nonalcoholic controls. Although nucleotide and predicted amino acid differences were detected between fibroblast cultures and the original cDNAs and among the cultures themselves, no specific nucleotide variations, which would encode a variant amino acid sequence, were associated exclusively with the coding region from WK patients. Thus, allelic variants of the transketolase gene cannot account for the biochemically distinct forms of the enzyme found in these patients nor be considered as a mechanism for genetic predisposition to the development of Wernicke-Korsakoff syndrome. Instead, the underlying mechanism must be extragenic and may be a result of differences in post-translational processing/modification of the transketolase polypeptide. PMID:8419340

  10. Human phosphoribosylformylglycineamide amidotransferase (FGARAT): regional mapping, complete coding sequence, isolation of a functional genomic clone, and DNA sequence analysis.

    PubMed

    Patterson, D; Bleskan, J; Gardiner, K; Bowersox, J

    1999-11-01

    Purines play essential roles in many cellular functions, including DNA replication, transcription, intra- and extra-cellular signaling, energy metabolism, and as coenzymes for many biochemical reactions. The de-novo synthesis of purines requires 10 enzymatic steps for the production of inosine monophosphate (IMP). Defects in purine metabolism are associated with human diseases. Further, many anticancer agents function as inhibitors of the de-novo biosynthetic pathway. Genes or cDNAs for most of the enzymes comprising this pathway have been isolated from humans or other mammals. One notable exception is the phosphoribosylformylglycineamide amidotransferase (FGARAT) gene, which encodes the fourth step of this pathway. This gene has been cloned from numerous microorganisms and from Drosophila melanogaster and C. elegans. We report here the identification of a human cDNA containing the coding region of the FGARAT mRNA and the isolation of a P1 clone that contains an intact human FGARAT gene. The P1 clone corrects the purine auxotrophy and protein deficiency of Chinese hamster ovary (CHO) cell mutants (AdeB) deficient in both the activity and the protein for FGARAT. The P1 clone was used to regionally map the FGARAT gene to chromosome region 17p13, a location consistent with our prior assignment of this gene to chromosome 17. A comparison of the DNA sequence of the human FGARAT and FGARAT DNA sequence from 17 other organisms is reported. The isolation of this gene means that DNA clones for all the 10 steps of IMP synthesis have been isolated from humans or other mammals. PMID:10548741

  11. Automated conserved non-coding sequence (CNS) discovery reveals differences in gene content and promoter evolution among grasses

    PubMed Central

    Turco, Gina; Schnable, James C.; Pedersen, Brent; Freeling, Michael

    2013-01-01

    Conserved non-coding sequences (CNS) are islands of non-coding sequence that, like protein coding exons, show less divergence in sequence between related species than functionless DNA. Several CNSs have been demonstrated experimentally to function as cis-regulatory regions. However, the specific functions of most CNSs remain unknown. Previous searches for CNS in plants have either anchored on exons and only identified nearby sequences or required years of painstaking manual annotation. Here we present an open source tool that can accurately identify CNSs between any two related species with sequenced genomes, including both those immediately adjacent to exons and distal sequences separated by >12 kb of non-coding sequence. We have used this tool to characterize new motifs, associate CNSs with additional functions, and identify previously undetected genes encoding RNA and protein in the genomes of five grass species. We provide a list of 15,363 orthologous CNSs conserved across all grasses tested. We were also able to identify regulatory sequences present in the common ancestor of grasses that have been lost in one or more extant grass lineages. Lists of orthologous gene pairs and associated CNSs are provided for reference inbred lines of arabidopsis, Japonica rice, foxtail millet, sorghum, brachypodium, and maize. PMID:23874343

  12. Transactivation specificity is conserved among p53 family proteins and depends on a response element sequence code

    PubMed Central

    Ciribilli, Yari; Monti, Paola; Bisio, Alessandra; Nguyen, H. Thien; Ethayathulla, Abdul S.; Ramos, Ana; Foggetti, Giorgia; Menichini, Paola; Menendez, Daniel; Resnick, Michael A.; Viadiu, Hector; Fronza, Gilberto; Inga, Alberto

    2013-01-01

    Structural and biochemical studies have demonstrated that p73, p63 and p53 recognize DNA with identical amino acids and similar binding affinity. Here, measuring transactivation activity for a large number of response elements (REs) in yeast and human cell lines, we show that p53 family proteins also have overlapping transactivation profiles. We identified mutations at conserved amino acids of loops L1 and L3 in the DNA-binding domain that tune the transactivation potential nearly equally in p73, p63 and p53. For example, the mutant S139F in p73 has higher transactivation potential towards selected REs, enhanced DNA-binding cooperativity in vitro and a flexible loop L1 as seen in the crystal structure of the protein–DNA complex. By studying, how variations in the RE sequence affect transactivation specificity, we discovered a RE-transactivation code that predicts enhanced transactivation; this correlation is stronger for promoters of genes associated with apoptosis. PMID:23892287

  13. Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach

    SciTech Connect

    Uberbacher, E.C.; Mural, R.J. Univ. of Tennessee, Oak Ridge )

    1991-12-15

    Genes in higher eukaryotes may span tens or hundreds of kilobases with the protein-coding regions accounting for only a few percent of the total sequence. Identifying genes within large regions of uncharacterized DNA is a difficult undertaking and is currently the focus of many research efforts. The authors describe a reliable computational approach for locating protein-coding portions of genes in anonymous DNA sequence. Using a concept suggested by robotic environmental sensing, the authors method combines a set of sensor algorithms and a neural network to localize the coding regions. Several algorithms that report local characteristics of the DNA sequence, and therefore act as sensors, are also described. In its current configuration the coding recognition module identifies 90% of coding exons of length 100 bases or greater with less than one false positive coding exon indicated per five coding exons indicated. This is a significantly lower false positive rate than any method of which the authors are aware. This module demonstrates a method with general applicability to sequence-pattern recognition problems and is available for current research efforts.

  14. Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach.

    PubMed Central

    Uberbacher, E C; Mural, R J

    1991-01-01

    Genes in higher eukaryotes may span tens or hundreds of kilobases with the protein-coding regions accounting for only a few percent of the total sequence. Identifying genes within large regions of uncharacterized DNA is a difficult undertaking and is currently the focus of many research efforts. We describe a reliable computational approach for locating protein-coding portions of genes in anonymous DNA sequence. Using a concept suggested by robotic environmental sensing, our method combines a set of sensor algorithms and a neural network to localize the coding regions. Several algorithms that report local characteristics of the DNA sequence, and therefore act as sensors, are also described. In its current configuration the "coding recognition module" identifies 90% of coding exons of length 100 bases or greater with less than one false positive coding exon indicated per five coding exons indicated. This is a significantly lower false positive rate than any method of which we are aware. This module demonstrates a method with general applicability to sequence-pattern recognition problems and is available for current research efforts. PMID:1763041

  15. Variation in conserved non-coding sequences on chromosome 5q andsusceptibility to asthma and atopy

    SciTech Connect

    Donfack, Joseph; Schneider, Daniel H.; Tan, Zheng; Kurz,Thorsten; Dubchak, Inna; Frazer, Kelly A.; Ober, Carole

    2005-09-10

    Background: Evolutionarily conserved sequences likely havebiological function. Methods: To determine whether variation in conservedsequences in non-coding DNA contributes to risk for human disease, westudied six conserved non-coding elements in the Th2 cytokine cluster onhuman chromosome 5q31 in a large Hutterite pedigree and in samples ofoutbred European American and African American asthma cases and controls.Results: Among six conserved non-coding elements (>100 bp,>70percent identity; human-mouse comparison), we identified one singlenucleotide polymorphism (SNP) in each of two conserved elements and sixSNPs in the flanking regions of three conserved elements. We genotypedour samples for four of these SNPs and an additional three SNPs each inthe IL13 and IL4 genes. While there was only modest evidence forassociation with single SNPs in the Hutterite and European Americansamples (P<0.05), there were highly significant associations inEuropean Americans between asthma and haplotypes comprised of SNPs in theIL4 gene (P<0.001), including a SNP in a conserved non-codingelement. Furthermore, variation in the IL13 gene was strongly associatedwith total IgE (P = 0.00022) and allergic sensitization to mold allergens(P = 0.00076) in the Hutterites, and more modestly associated withsensitization to molds in the European Americans and African Americans (P<0.01). Conclusion: These results indicate that there is overalllittle variation in the conserved non-coding elements on 5q31, butvariation in IL4 and IL13, including possibly one SNP in a conservedelement, influence asthma and atopic phenotypes in diversepopulations.

  16. A transcriptional regulatory element in the coding sequence of the human Bcl-2 gene

    PubMed Central

    Lang, Georgina; Gombert, Wendy M; Gould, Hannah J

    2005-01-01

    We investigated the protein-binding sites in a DNAse I hypersensitive site associated with bcl-2 gene expression in human B cells. We mapped this hypersensitive site to the coding sequence of exon 2 of the bcl-2 gene in the bcl-2-expressing REH B-cell line. Electrophoretic mobility shift assays (EMSAs) with extracts from REH cells revealed three previously unrecognized B-Myb-binding sites in this sequence. The protein was identified as B-Myb by using a specific antibody and EMSAs. Accordingly, the levels of B-Myb and bcl-2 proteins, and of Myb EMSA activity, were correlated over a wide range of cell lines, representing different stages of B-cell development. Transfection of REH cells with antisense B-myb down-regulated EMSA activity and the level of bcl-2, and led to the apoptosis of REH cells. Transfection of the bcl-2-non-expressing RPMI 8226 cell line with a B-Myb expression vector induced B-Myb EMSA activity and the expression of bcl-2. Reporter assays indicated that the HSS8 sequence containing the three B-Myb sites may act as an enhancer when it is linked to the bcl-2 gene promoter. Interaction of B-Myb with HSS8 may enhance bcl-2 gene expression by co-operating with positive regulatory elements (e.g. previously identified B-Myb response elements) or silencing negative response elements in the bcl-2 gene promoter. PMID:15606792

  17. CoRAL: predicting non-coding RNAs from small RNA-sequencing data.

    PubMed

    Leung, Yuk Yee; Ryvkin, Paul; Ungar, Lyle H; Gregory, Brian D; Wang, Li-San

    2013-08-01

    The surprising observation that virtually the entire human genome is transcribed means we know little about the function of many emerging classes of RNAs, except their astounding diversities. Traditional RNA function prediction methods rely on sequence or alignment information, which are limited in their abilities to classify the various collections of non-coding RNAs (ncRNAs). To address this, we developed Classification of RNAs by Analysis of Length (CoRAL), a machine learning-based approach for classification of RNA molecules. CoRAL uses biologically interpretable features including fragment length and cleavage specificity to distinguish between different ncRNA populations. We evaluated CoRAL using genome-wide small RNA sequencing data sets from four human tissue types and were able to classify six different types of RNAs with ∼80% cross-validation accuracy. Analysis by CoRAL revealed that microRNAs, small nucleolar and transposon-derived RNAs are highly discernible and consistent across all human tissue types assessed, whereas long intergenic ncRNAs, small cytoplasmic RNAs and small nuclear RNAs show less consistent patterns. The ability to reliably annotate loci across tissue types demonstrates the potential of CoRAL to characterize ncRNAs using small RNA sequencing data in less well-characterized organisms. PMID:23700308

  18. Two Lamprey Hedgehog Genes Share Non-Coding Regulatory Sequences and Expression Patterns with Gnathostome Hedgehogs

    PubMed Central

    Ekker, Marc; Hadzhiev, Yavor; Müller, Ferenc; Casane, Didier; Magdelenat, Ghislaine; Rétaux, Sylvie

    2010-01-01

    Hedgehog (Hh) genes play major roles in animal development and studies of their evolution, expression and function point to major differences among chordates. Here we focused on Hh genes in lampreys in order to characterize the evolution of Hh signalling at the emergence of vertebrates. Screening of a cosmid library of the river lamprey Lampetra fluviatilis and searching the preliminary genome assembly of the sea lamprey Petromyzon marinus indicate that lampreys have two Hh genes, named Hha and Hhb. Phylogenetic analyses suggest that Hha and Hhb are lamprey-specific paralogs closely related to Sonic/Indian Hh genes. Expression analysis indicates that Hha and Hhb are expressed in a Sonic Hh-like pattern. The two transcripts are expressed in largely overlapping but not identical domains in the lamprey embryonic brain, including a newly-described expression domain in the nasohypophyseal placode. Global alignments of genomic sequences and local alignment with known gnathostome regulatory motifs show that lamprey Hhs share conserved non-coding elements (CNE) with gnathostome Hhs albeit with sequences that have significantly diverged and dispersed. Functional assays using zebrafish embryos demonstrate gnathostome-like midline enhancer activity for CNEs contained in intron2. We conclude that lamprey Hh genes are gnathostome Shh-like in terms of expression and regulation. In addition, they show some lamprey-specific features, including duplication and structural (but not functional) changes in the intronic/regulatory sequences. PMID:20967201

  19. Structural gene and complete amino acid sequence of Pseudomonas aeruginosa IFO 3455 elastase.

    PubMed Central

    Fukushima, J; Yamamoto, S; Morihara, K; Atsumi, Y; Takeuchi, H; Kawamoto, S; Okuda, K

    1989-01-01

    The DNA encoding the elastase of Pseudomonas aeruginosa IFO 3455 was cloned, and its complete nucleotide sequence was determined. When the cloned gene was ligated to pUC18, the Escherichia coli expression vector, bacteria carrying the gene exhibited high levels of both elastase activity and elastase antigens. The amino acid sequence, deduced from the nucleotide sequence, revealed that the mature elastase consisted of 301 amino acids with a relative molecular mass of 32,926 daltons. The amino acid composition predicted from the DNA sequence was quite similar to the chemically determined composition of purified elastase reported previously. We also observed nucleotide sequence encoding a signal peptide and "pro" sequence consisting of 197 amino acids upstream from the mature elastase protein gene. The amino acid sequence analysis revealed that both the N-terminal sequence of the purified elastase and the N-terminal side sequences of the C-terminal tryptic peptide as well as the internal lysyl peptide fragment were completely identical to the deduced amino acid sequences. The pattern of identity of amino acid sequences was quite evident in the regions that include structurally and functionally important residues of Bacillus subtilis thermolysin. PMID:2493453

  20. Phylogenetic relationships among insect orders based on three nuclear protein-coding gene sequences.

    PubMed

    Ishiwata, Keisuke; Sasaki, Go; Ogawa, Jiro; Miyata, Takashi; Su, Zhi-Hui

    2011-02-01

    Many attempts to resolve the phylogenetic relationships of higher groups of insects have been made based on both morphological and molecular evidence; nonetheless, most of the interordinal relationships of insects remain unclear or are controversial. As a new approach, in this study we sequenced three nuclear genes encoding the catalytic subunit of DNA polymerase delta and the two largest subunits of RNA polymerase II from all insect orders. The predicted amino acid sequences (In total, approx. 3500 amino acid sites) of these proteins were subjected to phylogenetic analyses based on the maximum likelihood and Bayesian analysis methods with various models. The resulting trees strongly support the monophyly of Palaeoptera, Neoptera, Polyneoptera, and Holometabola, while within Polyneoptera, the groupings of Isoptera/"Blattaria"/Mantodea (Superorder Dictyoptera), Dictyoptera/Zoraptera, Dermaptera/Plecoptera, Mantophasmatodea/Grylloblattodea, and Embioptera/Phasmatodea are supported. Although Paraneoptera is not supported as a monophyletic group, the grouping of Phthiraptera/Psocoptera is robustly supported. The interordinal relationships within Holometabola are well resolved and strongly supported that the order Hymenoptera is the sister lineage to all other holometabolous insects. The other orders of Holometabola are separated into two large groups, and the interordinal relationships of each group are (((Siphonaptera, Mecoptera), Diptera), (Trichoptera, Lepidoptera)) and ((Coleoptera, Strepsiptera), (Neuroptera, Raphidioptera, Megaloptera)). The sister relationship between Strepsiptera and Diptera are significantly rejected by all the statistical tests (AU, KH and wSH), while the affinity between Hymenoptera and Mecopterida are significantly rejected only by AU and KH tests. Our results show that the use of amino acid sequences of these three nuclear genes is an effective approach for resolving the relationships of higher groups of insects. PMID:21075208

  1. Detecting selection in the blue crab, Callinectes sapidus, using DNA sequence data from multiple nuclear protein-coding genes.

    PubMed

    Yednock, Bree K; Neigel, Joseph E

    2014-01-01

    The identification of genes involved in the adaptive evolution of non-model organisms with uncharacterized genomes constitutes a major challenge. This study employed a rigorous and targeted candidate gene approach to test for positive selection on protein-coding genes of the blue crab, Callinectes sapidus. Four genes with putative roles in physiological adaptation to environmental stress were chosen as candidates. A fifth gene not expected to play a role in environmental adaptation was used as a control. Large samples (n>800) of DNA sequences from C. sapidus were used in tests of selective neutrality based on sequence polymorphisms. In combination with these, sequences from the congener C. similis were used in neutrality tests based on interspecific divergence. In multiple tests, significant departures from neutral expectations and indicative of positive selection were found for the candidate gene trehalose 6-phosphate synthase (tps). These departures could not be explained by any of the historical population expansion or bottleneck scenarios that were evaluated in coalescent simulations. Evidence was also found for balancing selection at ATP-synthase subunit 9 (atps) using a maximum likelihood version of the Hudson, Kreitmen, and Aguadé test, and positive selection favoring amino acid replacements within ATP/ADP translocase (ant) was detected using the McDonald-Kreitman test. In contrast, test statistics for the control gene, ribosomal protein L12 (rpl), which presumably has experienced the same demographic effects as the candidate loci, were not significantly different from neutral expectations and could readily be explained by demographic effects. Together, these findings demonstrate the utility of the candidate gene approach for investigating adaptation at the molecular level in a marine invertebrate for which extensive genomic resources are not available. PMID:24896825

  2. Human retroviruses and AIDS 1996. A compilation and analysis of nucleic acid and amino acid sequences

    SciTech Connect

    Myers, G.; Foley, B.; Korber, B.; Mellors, J.W.; Jeang, K.T.; Wain-Hobson, S.

    1997-04-01

    This compendium and the accompanying floppy diskettes are the result of an effort to compile and rapidly publish all relevant molecular data concerning the human immunodeficiency viruses (HIV) and related retroviruses. The scope of the compendium and database is best summarized by the five parts that it comprises: (1) Nuclear Acid Alignments and Sequences; (2) Amino Acid Alignments; (3) Analysis; (4) Related Sequences; and (5) Database Communications. Information within all the parts is updated throughout the year on the Web site, http://hiv-web.lanl.gov. While this publication could take the form of a review or sequence monograph, it is not so conceived. Instead, the literature from which the database is derived has simply been summarized and some elementary computational analyses have been performed upon the data. Interpretation and commentary have been avoided insofar as possible so that the reader can form his or her own judgments concerning the complex information. In addition to the general descriptions of the parts of the compendium, the user should read the individual introductions for each part.

  3. The importance of being genomic: Non-coding and coding sequences suggest different models of toxin multi-gene family evolution.

    PubMed

    Malhotra, Anita; Creer, Simon; Harris, John B; Thorpe, Roger S

    2015-12-01

    Studies of multi-gene protein families, including many toxins, are crucial for understanding the role of gene duplication in generating protein diversity in general. However, many evolutionary analyses of gene families are based on coding sequences, and do not take into account many potentially confounding evolutionary factors, such as recombination and convergence due to selection. We illustrate this using snake venom gene sequences from the Phospholipase A2 (PLA2) subfamily. Novel gene sequences from 20 species of understudied Asian pitvipers were analyzed alongside available genomic PLA2 sequences from another four crotaline and several viperine species. In contrast to previous analyses of this toxin family based on cDNA sequences, we find that duplication events are concentrated at the tips of the tree, suggesting that major functions such as presynaptic neurotoxicity have evolved convergently multiple times in pitvipers. We provide evidence that this discrepancy is due to differing evolutionary patterns between introns and exons. The effects of several well-known sources of bias on the phylogeny were small, compared to the effect of analyses based on different partitions of the gene (whole gene sequence, non-coding regions, cDNA sequence). Switches of function were found to be largely associated with strong selection, and with duplication events. Use of coding sequences for phylogeny estimation potentially produces incorrect inferences about the action of selection on individual lineages and sites. Our results have major implications for phylogenomic methods of functional inference as well as for our understanding of the evolution of multigene families. PMID:26359851

  4. EzEditor: a versatile sequence alignment editor for both rRNA- and protein-coding genes.

    PubMed

    Jeon, Yoon-Seong; Lee, Kihyun; Park, Sang-Cheol; Kim, Bong-Soo; Cho, Yong-Joon; Ha, Sung-Min; Chun, Jongsik

    2014-02-01

    EzEditor is a Java-based molecular sequence editor allowing manipulation of both DNA and protein sequence alignments for phylogenetic analysis. It has multiple features optimized to connect initial computer-generated multiple alignment and subsequent phylogenetic analysis by providing manual editing with reference to biological information specific to the genes under consideration. It provides various functionalities for editing rRNA alignments using secondary structure information. In addition, it supports simultaneous editing of both DNA sequences and their translated protein sequences for protein-coding genes. EzEditor is, to our knowledge, the first sequence editing software designed for both rRNA- and protein-coding genes with the visualization of biologically relevant information and should be useful in molecular phylogenetic studies. EzEditor is based on Java, can be run on all major computer operating systems and is freely available from http://sw.ezbiocloud.net/ezeditor/. PMID:24425826

  5. The influence of viral coding sequences on pestivirus IRES activity reveals further parallels with translation initiation in prokaryotes.

    PubMed Central

    Fletcher, Simon P; Ali, Iraj K; Kaminski, Ann; Digard, Paul; Jackson, Richard J

    2002-01-01

    Classical swine fever virus (CSFV) is a member of the pestivirus family, which shares many features in common with hepatitis C virus (HCV). It is shown here that CSFV has an exceptionally efficient cis-acting internal ribosome entry segment (IRES), which, like that of HCV, is strongly influenced by the sequences immediately downstream of the initiation codon, and is optimal with viral coding sequences in this position. Constructs that retained 17 or more codons of viral coding sequence exhibited full IRES activity, but with only 12 codons, activity was approximately 66% of maximum in vitro (though close to maximum in transfected BHK cells), whereas with just 3 codons or fewer, the activity was only approximately 15% of maximum. The minimal coding region elements required for high activity were exchanged between HCV and CSFV. Although maximum activity was observed in each case with the homologous combination of coding region and 5' UTR, the heterologous combinations were sufficiently active to rule out a highly specific functional interplay between the 5' UTR and coding sequences. On the other hand, inversion of the coding sequences resulted in low IRES activity, particularly with the HCV coding sequences. RNA structure probing showed that the efficiency of internal initiation of these chimeric constructs correlated most closely with the degree of single-strandedness of the region around and immediately downstream of the initiation codon. The low activity IRESs could not be rescued by addition of supplementary eIF4A (the initiation factor with ATP-dependent RNA helicase activity). The extreme sensitivity to secondary structure around the initiation codon is likely to be due to the fact that the eIF4F complex (which has eIF4A as one of its subunits) is not required for and does not participate in initiation on these IRESs. PMID:12515388

  6. Natural vs. random protein sequences: Discovering combinatorics properties on amino acid words.

    PubMed

    Santoni, Daniele; Felici, Giovanni; Vergni, Davide

    2016-02-21

    Casual mutations and natural selection have driven the evolution of protein amino acid sequences that we observe at present in nature. The question about which is the dominant force of proteins evolution is still lacking of an unambiguous answer. Casual mutations tend to randomize protein sequences while, in order to have the correct functionality, one expects that selection mechanisms impose rigid constraints on amino acid sequences. Moreover, one also has to consider that the space of all possible amino acid sequences is so astonishingly large that it could be reasonable to have a well tuned amino acid sequence indistinguishable from a random one. In order to study the possibility to discriminate between random and natural amino acid sequences, we introduce different measures of association between pairs of amino acids in a sequence, and apply them to a dataset of 1047 natural protein sequences and 10,470 random sequences, carefully generated in order to preserve the relative length and amino acid distribution of the natural proteins. We analyze the multidimensional measures with machine learning techniques and show that, to a reasonable extent, natural protein sequences can be differentiated from random ones. PMID:26656109

  7. The Hypothesis that the Genetic Code Originated in Coupled Synthesis of Proteins and the Evolutionary Predecessors of Nucleic Acids in Primitive Cells

    PubMed Central

    Francis, Brian R.

    2015-01-01

    Although analysis of the genetic code has allowed explanations for its evolution to be proposed, little evidence exists in biochemistry and molecular biology to offer an explanation for the origin of the genetic code. In particular, two features of biology make the origin of the genetic code difficult to understand. First, nucleic acids are highly complicated polymers requiring numerous enzymes for biosynthesis. Secondly, proteins have a simple backbone with a set of 20 different amino acid side chains synthesized by a highly complicated ribosomal process in which mRNA sequences are read in triplets. Apparently, both nucleic acid and protein syntheses have extensive evolutionary histories. Supporting these processes is a complex metabolism and at the hub of metabolism are the carboxylic acid cycles. This paper advances the hypothesis that the earliest predecessor of the nucleic acids was a β-linked polyester made from malic acid, a highly conserved metabolite in the carboxylic acid cycles. In the β-linked polyester, the side chains are carboxylic acid groups capable of forming interstrand double hydrogen bonds. Evolution of the nucleic acids involved changes to the backbone and side chain of poly(β-d-malic acid). Conversion of the side chain carboxylic acid into a carboxamide or a longer side chain bearing a carboxamide group, allowed information polymers to form amide pairs between polyester chains. Aminoacylation of the hydroxyl groups of malic acid and its derivatives with simple amino acids such as glycine and alanine allowed coupling of polyester synthesis and protein synthesis. Use of polypeptides containing glycine and l-alanine for activation of two different monomers with either glycine or l-alanine allowed simple coded autocatalytic synthesis of polyesters and polypeptides and established the first genetic code. A primitive cell capable of supporting electron transport, thioester synthesis, reduction reactions, and synthesis of polyesters and

  8. The Hypothesis that the Genetic Code Originated in Coupled Synthesis of Proteins and the Evolutionary Predecessors of Nucleic Acids in Primitive Cells.

    PubMed

    Francis, Brian R

    2015-01-01

    Although analysis of the genetic code has allowed explanations for its evolution to be proposed, little evidence exists in biochemistry and molecular biology to offer an explanation for the origin of the genetic code. In particular, two features of biology make the origin of the genetic code difficult to understand. First, nucleic acids are highly complicated polymers requiring numerous enzymes for biosynthesis. Secondly, proteins have a simple backbone with a set of 20 different amino acid side chains synthesized by a highly complicated ribosomal process in which mRNA sequences are read in triplets. Apparently, both nucleic acid and protein syntheses have extensive evolutionary histories. Supporting these processes is a complex metabolism and at the hub of metabolism are the carboxylic acid cycles. This paper advances the hypothesis that the earliest predecessor of the nucleic acids was a β-linked polyester made from malic acid, a highly conserved metabolite in the carboxylic acid cycles. In the β-linked polyester, the side chains are carboxylic acid groups capable of forming interstrand double hydrogen bonds. Evolution of the nucleic acids involved changes to the backbone and side chain of poly(β-d-malic acid). Conversion of the side chain carboxylic acid into a carboxamide or a longer side chain bearing a carboxamide group, allowed information polymers to form amide pairs between polyester chains. Aminoacylation of the hydroxyl groups of malic acid and its derivatives with simple amino acids such as glycine and alanine allowed coupling of polyester synthesis and protein synthesis. Use of polypeptides containing glycine and l-alanine for activation of two different monomers with either glycine or l-alanine allowed simple coded autocatalytic synthesis of polyesters and polypeptides and established the first genetic code. A primitive cell capable of supporting electron transport, thioester synthesis, reduction reactions, and synthesis of polyesters and

  9. [Expanding genetic code: amino acids 21 and 22--selenocysteine and pyrrolysine].

    PubMed

    Lukashenko, N P

    2010-08-01

    The discovery of two nonstandard amino acids, selenocysteine and pyrrolysine, in the genetic code is discussed. These findings have expanded our understanding of the genetic code, since the repertoire of amino acids in the genetic code was supplemented by two novel ones, in addition of the standard 20 amino acids. Current views on specific mechanisms of selenocysteine insertion in forming selenoproteins are considered, as well as the results of studies of new translational components involved in biosynthesis and incorporation of selenocysteine at different stages of translation. Similarity in the strategies of decoding UGA and UAG as codons for respectively selenocysteine and pyrrolysine is discussed. The review also presents evidence on the medical and biological role of selenium and selenoproteins containing selenocysteine as the main biological form of selenium. PMID:20873198

  10. Natural Selection on Coding and Noncoding DNA Sequences Is Associated with Virulence Genes in a Plant Pathogenic Fungus

    PubMed Central

    Rech, Gabriel E.; Sanz-Martín, José M.; Anisimova, Maria; Sukno, Serenella A.; Thon, Michael R.

    2014-01-01

    Natural selection leaves imprints on DNA, offering the opportunity to identify functionally important regions of the genome. Identifying the genomic regions affected by natural selection within pathogens can aid in the pursuit of effective strategies to control diseases. In this study, we analyzed genome-wide patterns of selection acting on different classes of sequences in a worldwide sample of eight strains of the model plant-pathogenic fungus Colletotrichum graminicola. We found evidence of selective sweeps, balancing selection, and positive selection affecting both protein-coding and noncoding DNA of pathogenicity-related sequences. Genes encoding putative effector proteins and secondary metabolite biosynthetic enzymes show evidence of positive selection acting on the coding sequence, consistent with an Arms Race model of evolution. The 5′ untranslated regions (UTRs) of genes coding for effector proteins and genes upregulated during infection show an excess of high-frequency polymorphisms likely the consequence of balancing selection and consistent with the Red Queen hypothesis of evolution acting on these putative regulatory sequences. Based on the findings of this work, we propose that even though adaptive substitutions on coding sequences are important for proteins that interact directly with the host, polymorphisms in the regulatory sequences may confer flexibility of gene expression in the virulence processes of this important plant pathogen. PMID:25193312

  11. An alternative strategy to generate coding sequence of macrophage migration inhibitory factor-2 of Wuchereria bancrofti

    PubMed Central

    Chauhan, Nikhil; Hoti, S.L.

    2016-01-01

    Background & objectives: Different developmental stages of Wuchereria bancrofti, the major causal organism of lymphatic filariasis (LF), are difficult to obtain. Beside this limitation, to obtain complete coding sequence (CDS) of a gene one has to isolate mRNA and perform subsequent cDNA synthesis which is laborious and not successful at times. In this study, an alternative strategy employing polymerase chain reaction (PCR) was optimized and validated, to generate CDS of Macrophage migration Inhibitory Factor-2 (wbMIF-2), a gene expressed in the transition stage between L3 to L4. Methods: The genomic DNA of W. bancrofti microfilariae was extracted and used to amplify the full length wbMIF-2 gene (4.275 kb). This amplified product was used as a template for amplifying the exons separately, using the overlapping primers, which were then assembled through another round of PCR. Results: A simple strategy was developed based on PCR, which is used routinely in molecular biology laboratories. The amplified CDS of 363 bp of wbMIF-2 generated using genomic DNA splicing technique was devoid of any intronic sequence. Interpretation & conclusions: The cDNA of wbMIF-2 gene was successfully amplified from genomic DNA of microfilarial stage of W. bancrofti thus circumventing the use of inaccessible L3-L4 transitional stage of this parasite. This strategy is useful for generating CDS of genes from parasites that have restricted availability. PMID:27121522

  12. Detection by real time PCR of walnut allergen coding sequences in processed foods.

    PubMed

    Linacero, Rosario; Ballesteros, Isabel; Sanchiz, Africa; Prieto, Nuria; Iniesto, Elisa; Martinez, Yolanda; Pedrosa, Mercedes M; Muzquiz, Mercedes; Cabanillas, Beatriz; Rovira, Mercè; Burbano, Carmen; Cuadrado, Carmen

    2016-07-01

    A quantitative real-time PCR (RT-PCR) method, employing novel primer sets designed on Jug r 1, Jug r 3, and Jug r 4 allergen-coding sequences, was set up and validated. Its specificity, sensitivity, and applicability were evaluated. The DNA extraction method based on CTAB-phenol-chloroform was best for walnut. RT-PCR allowed a specific and accurate amplification of allergen sequence, and the limit of detection was 2.5pg of walnut DNA. The method sensitivity and robustness were confirmed with spiked samples, and Jug r 3 primers detected up to 100mg/kg of raw walnut (LOD 0.01%, LOQ 0.05%). Thermal treatment combined with pressure (autoclaving) reduced yield and amplification (integrity and quality) of walnut DNA. High hydrostatic pressure (HHP) did not produce any effect on the walnut DNA amplification. This RT-PCR method showed greater sensitivity and reliability in the detection of walnut traces in commercial foodstuffs compared with ELISA assays. PMID:26920302

  13. Complete Genome Sequence of the d-Amino Acid Catabolism Bacterium Phaeobacter sp. Strain JL2886, Isolated from Deep Seawater of the South China Sea.

    PubMed

    Fu, Yingnan; Wang, Rui; Zhang, Zilian; Jiao, Nianzhi

    2016-01-01

    Phaeobacter sp. strain JL2886, isolated from deep seawater of the South China Sea, can catabolize d-amino acids. Here, we report the complete genome sequence of Phaeobacter sp. JL2886. It comprises ~4.06 Mbp, with a G+C content of 61.52%. A total of 3,913 protein-coding genes and 10 genes related to d-amino acid catabolism were obtained. PMID:27587825

  14. Complete Genome Sequence of the d-Amino Acid Catabolism Bacterium Phaeobacter sp. Strain JL2886, Isolated from Deep Seawater of the South China Sea

    PubMed Central

    Fu, Yingnan; Wang, Rui

    2016-01-01

    Phaeobacter sp. strain JL2886, isolated from deep seawater of the South China Sea, can catabolize d-amino acids. Here, we report the complete genome sequence of Phaeobacter sp. JL2886. It comprises ~4.06 Mbp, with a G+C content of 61.52%. A total of 3,913 protein-coding genes and 10 genes related to d-amino acid catabolism were obtained. PMID:27587825

  15. Evolution of vertebrate IgM: complete amino acid sequence of the constant region of Ambystoma mexicanum mu chain deduced from cDNA sequence.

    PubMed

    Fellah, J S; Wiles, M V; Charlemagne, J; Schwager, J

    1992-10-01

    cDNA clones coding for the constant region of the Mexican axolotl (Ambystoma mexicanum) mu heavy immunoglobulin chain were selected from total spleen RNA, using a cDNA polymerase chain reaction technique. The specific 5'-end primer was an oligonucleotide homologous to the JH segment of Xenopus laevis mu chain. One of the clones, JHA/3, corresponded to the complete constant region of the axolotl mu chain, consisting of a 1362-nucleotide sequence coding for a polypeptide of 454 amino acids followed in 3' direction by a 179-nucleotide untranslated region and a polyA+ tail. The axolotl C mu is divided into four typical domains (C mu 1-C mu 4) and can be aligned with the Xenopus C mu with an overall identity of 56% at the nucleotide level. Percent identities were particularly high between C mu 1 (59%) and C mu 4 (71%). The C-terminal 20-amino acid segment which constitutes the secretory part of the mu chain is strongly homologous to the equivalent sequences of chondrichthyans and of other tetrapods, including a conserved N-linked oligosaccharide, the penultimate cysteine and the C-terminal lysine. The four C mu domains of 13 vertebrate species ranging from chondrichthyans to mammals were aligned and compared at the amino acid level. The significant number of mu-specific residues which are conserved into each of the four C mu domains argues for a continuous line of evolution of the vertebrate mu chain. This notion was confirmed by the ability to reconstitute a consistent vertebrate evolution tree based on the phylogenic parsimony analysis of the C mu 4 sequences. PMID:1382992

  16. Sequence-Based Analysis Uncovers an Abundance of Non-Coding RNA in the Total Transcriptome of Mycobacterium tuberculosis

    PubMed Central

    Arnvig, Kristine B.; Comas, Iñaki; Thomson, Nicholas R.; Houghton, Joanna; Boshoff, Helena I.; Croucher, Nicholas J.; Rose, Graham; Perkins, Timothy T.; Parkhill, Julian; Dougan, Gordon; Young, Douglas B.

    2011-01-01

    RNA sequencing provides a new perspective on the genome of Mycobacterium tuberculosis by revealing an extensive presence of non-coding RNA, including long 5’ and 3’ untranslated regions, antisense transcripts, and intergenic small RNA (sRNA) molecules. More than a quarter of all sequence reads mapping outside of ribosomal RNA genes represent non-coding RNA, and the density of reads mapping to intergenic regions was more than two-fold higher than that mapping to annotated coding sequences. Selected sRNAs were found at increased abundance in stationary phase cultures and accumulated to remarkably high levels in the lungs of chronically infected mice, indicating a potential contribution to pathogenesis. The ability of tubercle bacilli to adapt to changing environments within the host is critical to their ability to cause disease and to persist during drug treatment; it is likely that novel post-transcriptional regulatory networks will play an important role in these adaptive responses. PMID:22072964

  17. FOURTH SEMINAR TO THE MEMORY OF D.N. KLYSHKO: Algebraic solution of the synthesis problem for coded sequences

    NASA Astrophysics Data System (ADS)

    Leukhin, Anatolii N.

    2005-08-01

    The algebraic solution of a 'complex' problem of synthesis of phase-coded (PC) sequences with the zero level of side lobes of the cyclic autocorrelation function (ACF) is proposed. It is shown that the solution of the synthesis problem is connected with the existence of difference sets for a given code dimension. The problem of estimating the number of possible code combinations for a given code dimension is solved. It is pointed out that the problem of synthesis of PC sequences is related to the fundamental problems of discrete mathematics and, first of all, to a number of combinatorial problems, which can be solved, as the number factorisation problem, by algebraic methods by using the theory of Galois fields and groups.

  18. Biased gene conversion and GC-content evolution in the coding sequences of reptiles and vertebrates.

    PubMed

    Figuet, Emeric; Ballenghien, Marion; Romiguier, Jonathan; Galtier, Nicolas

    2015-01-01

    Mammalian and avian genomes are characterized by a substantial spatial heterogeneity of GC-content, which is often interpreted as reflecting the effect of local GC-biased gene conversion (gBGC), a meiotic repair bias that favors G and C over A and T alleles in high-recombining genomic regions. Surprisingly, the first fully sequenced nonavian sauropsid (i.e., reptile), the green anole Anolis carolinensis, revealed a highly homogeneous genomic GC-content landscape, suggesting the possibility that gBGC might not be at work in this lineage. Here, we analyze GC-content evolution at third-codon positions (GC3) in 44 vertebrates species, including eight newly sequenced transcriptomes, with a specific focus on nonavian sauropsids. We report that reptiles, including the green anole, have a genome-wide distribution of GC3 similar to that of mammals and birds, and we infer a strong GC3-heterogeneity to be already present in the tetrapod ancestor. We further show that the dynamic of coding sequence GC-content is largely governed by karyotypic features in vertebrates, notably in the green anole, in agreement with the gBGC hypothesis. The discrepancy between third-codon positions and noncoding DNA regarding GC-content dynamics in the green anole could not be explained by the activity of transposable elements or selection on codon usage. This analysis highlights the unique value of third-codon positions as an insertion/deletion-free marker of nucleotide substitution biases that ultimately affect the evolution of proteins. PMID:25527834

  19. Biased Gene Conversion and GC-Content Evolution in the Coding Sequences of Reptiles and Vertebrates

    PubMed Central

    Figuet, Emeric; Ballenghien, Marion; Romiguier, Jonathan; Galtier, Nicolas

    2015-01-01

    Mammalian and avian genomes are characterized by a substantial spatial heterogeneity of GC-content, which is often interpreted as reflecting the effect of local GC-biased gene conversion (gBGC), a meiotic repair bias that favors G and C over A and T alleles in high-recombining genomic regions. Surprisingly, the first fully sequenced nonavian sauropsid (i.e., reptile), the green anole Anolis carolinensis, revealed a highly homogeneous genomic GC-content landscape, suggesting the possibility that gBGC might not be at work in this lineage. Here, we analyze GC-content evolution at third-codon positions (GC3) in 44 vertebrates species, including eight newly sequenced transcriptomes, with a specific focus on nonavian sauropsids. We report that reptiles, including the green anole, have a genome-wide distribution of GC3 similar to that of mammals and birds, and we infer a strong GC3-heterogeneity to be already present in the tetrapod ancestor. We further show that the dynamic of coding sequence GC-content is largely governed by karyotypic features in vertebrates, notably in the green anole, in agreement with the gBGC hypothesis. The discrepancy between third-codon positions and noncoding DNA regarding GC-content dynamics in the green anole could not be explained by the activity of transposable elements or selection on codon usage. This analysis highlights the unique value of third-codon positions as an insertion/deletion-free marker of nucleotide substitution biases that ultimately affect the evolution of proteins. PMID:25527834

  20. DNA sequencing and bar-coding using solid-state nanopores.

    PubMed

    Atas, Evrim; Singer, Alon; Meller, Amit

    2012-12-01

    Nanopores have emerged as a prominent single-molecule analytic tool with particular promise for genomic applications. In this review, we discuss two potential applications of the nanopore sensors: First, we present a nanopore-based single-molecule DNA sequencing method that utilizes optical detection for massively parallel throughput. Second, we describe a method by which nanopores can be used as single-molecule genotyping tools. For DNA sequencing, the distinction among the four types of DNA nucleobases is achieved by employing a biochemical procedure for DNA expansion. In this approach, each nucleobase in each DNA strand is converted into one of four predefined unique 16-mers in a process that preserves the nucleobase sequence. The resulting converted strands are then hybridized to a library of four molecular beacons, each carrying a unique fluorophore tag, that are perfect complements to the 16-mers used for conversion. Solid-state nanopores are then used to sequentially remove these beacons, one after the other, leading to a series of photon bursts in four colors that can be optically detected. Single-molecule genotyping is achieved by tagging the DNA fragments with γ-modified synthetic peptide nucleic acid probes coupled to an electronic characterization of the complexes using solid-state nanopores. This method can be used to identify and differentiate genes with a high level of sequence similarity at the single-molecule level, but different pathology or response to treatment. We will illustrate this method by differentiating the pol gene for two highly similar human immunodeficiency virus subtypes, paving the way for a novel diagnostics platform for viral classification. PMID:23109189

  1. Detection and isolation of nucleic acid sequences using a bifunctional hybridization probe

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    2000-01-01

    A method for detecting and isolating a target sequence in a sample of nucleic acids is provided using a bifunctional hybridization probe capable of hybridizing to the target sequence that includes a detectable marker and a first complexing agent capable of forming a binding pair with a second complexing agent. A kit is also provided for detecting a target sequence in a sample of nucleic acids using a bifunctional hybridization probe according to this method.

  2. Complete Genome Sequence of Enterococcus mundtii QU 25, an Efficient l-(+)-Lactic Acid-Producing Bacterium

    PubMed Central

    Shiwa, Yuh; Yanase, Hiroaki; Hirose, Yuu; Satomi, Shohei; Araya-Kojima, Tomoko; Watanabe, Satoru; Zendo, Takeshi; Chibazakura, Taku; Shimizu-Kadota, Mariko; Yoshikawa, Hirofumi; Sonomoto, Kenji

    2014-01-01

    Enterococcus mundtii QU 25, a non-dairy bacterial strain of ovine faecal origin, can ferment both cellobiose and xylose to produce l-lactic acid. The use of this strain is highly desirable for economical l-lactate production from renewable biomass substrates. Genome sequence determination is necessary for the genetic improvement of this strain. We report the complete genome sequence of strain QU 25, primarily determined using Pacific Biosciences sequencing technology. The E. mundtii QU 25 genome comprises a 3 022 186-bp single circular chromosome (GC content, 38.6%) and five circular plasmids: pQY182, pQY082, pQY039, pQY024, and pQY003. In all, 2900 protein-coding sequences, 63 tRNA genes, and 6 rRNA operons were predicted in the QU 25 chromosome. Plasmid pQY024 harbours genes for mundticin production. We found that strain QU 25 produces a bacteriocin, suggesting that mundticin-encoded genes on plasmid pQY024 were functional. For lactic acid fermentation, two gene clusters were identified—one involved in the initial metabolism of xylose and uptake of pentose and the second containing genes for the pentose phosphate pathway and uptake of related sugars. This is the first complete genome sequence of an E. mundtii strain. The data provide insights into lactate production in this bacterium and its evolution among enterococci. PMID:24568933

  3. A Histone Deacetylase Adjusts Transcription Kinetics at Coding Sequences during Candida albicans Morphogenesis

    PubMed Central

    Hnisz, Denes; Bardet, Anaïs F.; Nobile, Clarissa J.; Petryshyn, Andriy; Glaser, Walter; Schöck, Ulrike; Stark, Alexander; Kuchler, Karl

    2012-01-01

    Despite their classical role as transcriptional repressors, several histone deacetylases, including the baker's yeast Set3/Hos2 complex (Set3C), facilitate gene expression. In the dimorphic human pathogen Candida albicans, the homologue of the Set3C inhibits the yeast-to-filament transition, but the precise molecular details of this function have remained elusive. Here, we use a combination of ChIP–Seq and RNA–Seq to show that the Set3C acts as a transcriptional co-factor of metabolic and morphogenesis-related genes in C. albicans. Binding of the Set3C correlates with gene expression during fungal morphogenesis; yet, surprisingly, deletion of SET3 leaves the steady-state expression level of most genes unchanged, both during exponential yeast-phase growth and during the yeast-filament transition. Fine temporal resolution of transcription in cells undergoing this transition revealed that the Set3C modulates transient expression changes of key morphogenesis-related genes. These include a transcription factor cluster comprising of NRG1, EFG1, BRG1, and TEC1, which form a regulatory circuit controlling hyphal differentiation. Set3C appears to restrict the factors by modulating their transcription kinetics, and the hyperfilamentous phenotype of SET3-deficient cells can be reverted by mutating the circuit factors. These results indicate that the chromatin status at coding regions represents a dynamic platform influencing transcription kinetics. Moreover, we suggest that transcription at the coding sequence can be transiently decoupled from potentially conflicting promoter information in dynamic environments. PMID:23236295

  4. Classifier assessment and feature selection for recognizing short coding sequences of human genes.

    PubMed

    Song, Kai; Zhang, Ze; Tong, Tuo-Peng; Wu, Fang

    2012-03-01

    With the ever-increasing pace of genome sequencing, there is a great need for fast and accurate computational tools to automatically identify genes in these genomes. Although great progress has been made in the development of gene-finding algorithms during the past decades, there is still room for further improvement. In particular, the issue of recognizing short exons in eukaryotes is still not solved satisfactorily. This article is devoted to assessing various linear and kernel-based classification algorithms and selecting the best combination of Z-curve features for further improvement of the issue. Eight state-of-the-art linear and kernel-based supervised pattern recognition techniques were used to identify the short (21-192 bp) coding sequences of human genes. By measuring the prediction accuracy, the tradeoff between sensitivity and specificity and the time consumption, partial least squares (PLS) and kernel partial least squares (KPLS) algorithms were verified to be the most optimal linear and kernel-based classifiers, respectively. A surprising result was that, by making good use of the interpretability of the PLS and the Z-curve methods, 93 Z-curve features were proved to be the best selective combination. Using them, the average recognition accuracy was improved as high as 7.7% by means of KPLS when compared with what was obtained by the Fisher discriminant analysis using 189 Z-curve variables (Gao and Zhang, 2004 ). The used codes are freely available from the following approaches (implemented in MATLAB and supported on Linux and MS Windows): (1) SVM: http://www.support-vector-machines.org/SVM_soft.html. (2) GP: http://www.gaussianprocess.org. (3) KPLS and KFDA: Taylor, J.S., and Cristianini, N. 2004. Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge, UK. (4) PLS: Wise, B.M., and Gallagher, N.B. 2011. PLS-Toolbox for use with MATLAB: ver 1.5.2. Eigenvector Technologies, Manson, WA. Supplementary Material for this article is

  5. Using Disease-Associated Coding Sequence Variation to Investigate Functional Compensation by Human Paralogous Proteins

    PubMed Central

    Miura, Sayaka; Tate, Stephanie; Kumar, Sudhir

    2015-01-01

    Gene duplication enables the functional diversification in species. It is thought that duplicated genes may be able to compensate if the function of one of the gene copies is disrupted. This possibility is extensively debated with some studies reporting proteome-wide compensation, whereas others suggest functional compensation among only recent gene duplicates or no compensation at all. We report results from a systematic molecular evolutionary analysis to test the predictions of the functional compensation hypothesis. We contrasted the density of Mendelian disease-associated single nucleotide variants (dSNVs) in proteins with no discernable paralogs (singletons) with the dSNV density in proteins found in multigene families. Under the functional compensation hypothesis, we expected to find greater numbers of dSNVs in singletons due to the lack of any compensating partners. Our analyses produced an opposite pattern; paralogs have over 35% higher dSNV density than singletons. We found that these patterns are concordant with similar differences in the rates of amino acid evolution (ie, functional constraints), as the proteins with paralogs have evolved 33% slower than singletons. Our evolutionary constraint explanation is robust to differences in family sizes, ages (young vs. old duplicates), and degrees of amino acid sequence similarities among paralogs. Therefore, disease-associated human variation does not exhibit significant signals of functional compensation among paralogous proteins, but rather an evolutionary constraint hypothesis provides a better explanation for the observed patterns of disease-associated and neutral polymorphisms in the human genome. PMID:26604664

  6. Oxidation of cellular amino acid pools leads to cytotoxic mistranslation of the genetic code

    PubMed Central

    Bullwinkle, Tammy J; Reynolds, Noah M; Raina, Medha; Moghal, Adil; Matsa, Eleftheria; Rajkovic, Andrei; Kayadibi, Huseyin; Fazlollahi, Farbod; Ryan, Christopher; Howitz, Nathaniel; Faull, Kym F; Lazazzera, Beth A; Ibba, Michael

    2014-01-01

    Aminoacyl-tRNA synthetases use a variety of mechanisms to ensure fidelity of the genetic code and ultimately select the correct amino acids to be used in protein synthesis. The physiological necessity of these quality control mechanisms in different environments remains unclear, as the cost vs benefit of accurate protein synthesis is difficult to predict. We show that in Escherichia coli, a non-coded amino acid produced through oxidative damage is a significant threat to the accuracy of protein synthesis and must be cleared by phenylalanine-tRNA synthetase in order to prevent cellular toxicity caused by mis-synthesized proteins. These findings demonstrate how stress can lead to the accumulation of non-canonical amino acids that must be excluded from the proteome in order to maintain cellular viability. DOI: http://dx.doi.org/10.7554/eLife.02501.001 PMID:24891238

  7. Partial amino acid sequence of human factor D:homology with serine proteases.

    PubMed Central

    Volanakis, J E; Bhown, A; Bennett, J C; Mole, J E

    1980-01-01

    Human factor D purified to homogeneity by a modified procedure was subjected to NH2-terminal amino acid sequence analysis by using a modified automated Beckman sequencer. We identified 48 of the first 57 NH2-terminal amino acids in a single sequencer run, using microgram quantities of factor D. The deduced amino acid sequence represents approximately 25% of the primary structure of factor D. This extended NH2-terminal amino acid sequence of factor D was compared to that of other trypsin-related serine proteases. By visual inspection, strong homologies (33--50% identity) were observed with all the serine proteases included in the comparison. Interestingly, factor D showed a higher degree of homology to serine proteases of pancreatic origin than to those of serum origin. Images PMID:6987665

  8. Characterization of N-glycosylation and amino acid sequence features of immunoglobulins from swine.

    PubMed

    Lopez, Paul G; Girard, Lauren; Buist, Marjorie; de Oliveira, Andrey Giovanni Gomes; Bodnar, Edward; Salama, Apolline; Soulillou, Jean-Paul; Perreault, Hélène

    2016-02-01

    The primary goal of this study was to develop a method to study the N-glycosylation of IgG from swine in order to detect epitopes containing N-glycolylneuraminic acid (Neu5Gc) and/or terminal galactose residues linked in α1-3 susceptible to cause xenograft-related problems. Samples of immunoglobulin were isolated from porcine serum using protein-A affinity chromatography. The eluate was then separated on electrophoretic gel, and bands corresponding to the N-glycosylated heavy chains were cut off the gel and subjected to tryptic digestion. Peptides and glycopeptides were separated by reversed phase liquid chromatography and fractions were collected for matrix-assisted laser desorption/ionization time-of-flight mass spectrometric (MALDI-TOF-MS) analysis. Overall no α1-3 galactose was detected, as demonstrated by complete susceptibility of terminal galactose residues to β-galactosidase digestion. Neu5Gc was detected on singly sialylated structures. Two major N-glycopeptides were found, EEQFNSTYR and EAQFNSTYR as determined by tandem MS (MS/MS), as previously reported by Butler et al. (Immunogenetics, 61, 2009, 209-230), who found 11 subclasses for porcine IgG. Out of the 11, ten include the sequence corresponding to EEQFNSTYR, and only one codes for EAQFNSTYR. In this study, glycosylation patterns associated with both chains were slightly different, in that EEQFNSTYR had a higher content of galactose. The last step of this study consisted of peptide-mapping the 11 reported porcine IgG sequences. Although there was considerable overlap, at least one unique tryptic peptide was found per IgG sequence. The workflow presented in this manuscript constitutes the first study to use MALDI-TOF-MS in the investigation of porcine IgG structural features. PMID:26586247

  9. Evolutionary divergence and limits of conserved non-coding sequence detection in plant genomes

    PubMed Central

    Reineke, Anna R.; Bornberg-Bauer, Erich; Gu, Jenny

    2011-01-01

    The discovery of regulatory motifs embedded in upstream regions of plants is a particularly challenging bioinformatics task. Previous studies have shown that motifs in plants are short compared with those found in vertebrates. Furthermore, plant genomes have undergone several diversification mechanisms such as genome duplication events which impact the evolution of regulatory motifs. In this article, a systematic phylogenomic comparison of upstream regions is conducted to further identify features of the plant regulatory genomes, the component of genomes regulating gene expression, to enable future de novo discoveries. The findings highlight differences in upstream region properties between major plant groups and the effects of divergence times and duplication events. First, clear differences in upstream region evolution can be detected between monocots and dicots, thus suggesting that a separation of these groups should be made when searching for novel regulatory motifs, particularly since universal motifs such as the TATA box are rare. Second, investigating the decay rate of significantly aligned regions suggests that a divergence time of ∼100 mya sets a limit for reliable conserved non-coding sequence (CNS) detection. Insights presented here will set a framework to help identify embedded motifs of functional relevance by understanding the limits of bioinformatics detection for CNSs. PMID:21470961

  10. Conserved Non-Coding Sequences are Associated with Rates of mRNA Decay in Arabidopsis

    PubMed Central

    Spangler, Jacob B.; Feltus, Frank Alex

    2013-01-01

    Steady-state mRNA levels are tightly regulated through a combination of transcriptional and post-transcriptional control mechanisms. The discovery of cis-acting DNA elements that encode these control mechanisms is of high importance. We have investigated the influence of conserved non-coding sequences (CNSs), DNA patterns retained after an ancient whole genome duplication event, on the breadth of gene expression and the rates of mRNA decay in Arabidopsis thaliana. The absence of CNSs near α duplicate genes was associated with a decrease in breadth of gene expression and slower mRNA decay rates while the presence CNSs near α duplicates was associated with an increase in breadth of gene expression and faster mRNA decay rates. The observed difference in mRNA decay rate was fastest in genes with CNSs in both non-transcribed and transcribed regions, albeit through an unknown mechanism. This study supports the notion that some Arabidopsis CNSs regulate the steady-state mRNA levels through post-transcriptional control mechanisms and that CNSs also play a role in controlling the breadth of gene expression. PMID:23675377

  11. Mice carrying a complete deletion of the talin2 coding sequence are viable and fertile

    SciTech Connect

    Debrand, Emmanuel; Conti, Francesco J.; Bate, Neil; Spence, Lorraine; Mazzeo, Daniela; Pritchard, Catrin A.; Monkley, Susan J.; Critchley, David R.

    2012-09-21

    Highlights: Black-Right-Pointing-Pointer Mice lacking talin2 are viable and fertile with only a mildly dystrophic phenotype. Black-Right-Pointing-Pointer Talin2 null fibroblasts show no major defects in proliferation, adhesion or migration. Black-Right-Pointing-Pointer Maintaining a colony of talin2 null mice is difficult indicating an underlying defect. -- Abstract: Mice homozygous for several Tln2 gene targeted alleles are viable and fertile. Here we show that although the expression of talin2 protein is drastically reduced in muscle from these mice, other tissues continue to express talin2 albeit at reduced levels. We therefore generated a Tln2 allele lacking the entire coding sequence (Tln2{sup cd}). Tln2{sup cd/cd} mice were viable and fertile, and the genotypes of Tln2{sup cd/+} intercrosses were at the expected Mendelian ratio. Tln2{sup cd/cd} mice showed no major difference in body mass or the weight of the major organs compared to wild-type, although they displayed a mildly dystrophic phenotype. Moreover, Tln2{sup cd/cd} mouse embryo fibroblasts showed no obvious defects in cell adhesion, migration or proliferation. However, the number of Tln2{sup cd/cd} pups surviving to adulthood was variable suggesting that such mice have an underlying defect.

  12. Source coherence impairments in a direct detection direct sequence optical code-division multiple-access system.

    PubMed

    Fsaifes, Ihsan; Lepers, Catherine; Lourdiane, Mounia; Gallion, Philippe; Beugin, Vincent; Guignard, Philippe

    2007-02-01

    We demonstrate that direct sequence optical code- division multiple-access (DS-OCDMA) encoders and decoders using sampled fiber Bragg gratings (S-FBGs) behave as multipath interferometers. In that case, chip pulses of the prime sequence codes generated by spreading in time-coherent data pulses can result from multiple reflections in the interferometers that can superimpose within a chip time duration. We show that the autocorrelation function has to be considered as the sum of complex amplitudes of the combined chip as the laser source coherence time is much greater than the integration time of the photodetector. To reduce the sensitivity of the DS-OCDMA system to the coherence time of the laser source, we analyze the use of sparse and nonperiodic quadratic congruence and extended quadratic congruence codes. PMID:17230236

  13. The rules of variation: Amino acid exchange according to the rotating circular genetic code

    PubMed Central

    Castro-Chavez, Fernando

    2011-01-01

    General guidelines for the molecular basis of functional variation are presented while focused on the rotating circular genetic code and allowable exchanges that make it resistant to genetic diseases under normal conditions. The rules of variation, bioinformatics aids for preventive medicine, are: (1) same position in the four quadrants for hydrophobic codons, (2) same or contiguous position in two quadrants for synonymous or related codons, and (3) same quadrant for equivalent codons. To preserve protein function, amino acid exchange according to the first rule takes into account the positional homology of essential hydrophobic amino acids with every codon with a central uracil in the four quadrants, the second rule includes codons for identical, acidic, or their amidic amino acids present in two quadrants, and the third rule, the smaller, aromatic, stop codons, and basic amino acids, each in proximity within a 90 degree angle. I also define codifying genes and palindromati, CTCGTGCCGAATTCGGCACGAG. PMID:20371250

  14. Deep sequencing of the tobacco mitochondrial transcriptome reveals expressed ORFs and numerous editing sites outside coding regions

    PubMed Central

    2014-01-01

    Background The purpose of this study was to sequence and assemble the tobacco mitochondrial transcriptome and obtain a genomic-level view of steady-state RNA abundance. Plant mitochondrial genomes have a small number of protein coding genes with large and variably sized intergenic spaces. In the tobacco mitogenome these intergenic spaces contain numerous open reading frames (ORFs) with no clear function. Results The assembled transcriptome revealed distinct monocistronic and polycistronic transcripts along with large intergenic spaces with little to no detectable RNA. Eighteen of the 117 ORFs were found to have steady-state RNA amounts above background in both deep-sequencing and qRT-PCR experiments and ten of those were found to be polysome associated. In addition, the assembled transcriptome enabled a full mitogenome screen of RNA C→U editing sites. Six hundred and thirty five potential edits were found with 557 occurring within protein-coding genes, five in tRNA genes, and 73 in non-coding regions. These sites were found in every protein-coding transcript in the tobacco mitogenome. Conclusion These results suggest that a small number of the ORFs within the tobacco mitogenome may produce functional proteins and that RNA editing occurs in coding and non-coding regions of mitochondrial transcripts. PMID:24433288

  15. Identification of random nucleic acid sequence aberrations using dual capture probes which hybridize to different chromosome regions

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    1998-01-01

    A method is provided for detecting nucleic acid sequence aberrations using two immobilization steps. According to the method, a nucleic acid sequence aberration is detected by detecting nucleic acid sequences having both a first nucleic acid sequence type (e.g., from a first chromosome) and a second nucleic acid sequence type (e.g., from a second chromosome), the presence of the first and the second nucleic acid sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. In the method, immobilization of a first hybridization probe is used to isolate a first set of nucleic acids in the sample which contain the first nucleic acid sequence type. Immobilization of a second hybridization probe is then used to isolate a second set of nucleic acids from within the first set of nucleic acids which contain the second nucleic acid sequence type. The second set of nucleic acids are then detected, their presence indicating the presence of a nucleic acid sequence aberration.

  16. Identification of random nucleic acid sequence aberrations using dual capture probes which hybridize to different chromosome regions

    DOEpatents

    Lucas, J.N.; Straume, T.; Bogen, K.T.

    1998-03-24

    A method is provided for detecting nucleic acid sequence aberrations using two immobilization steps. According to the method, a nucleic acid sequence aberration is detected by detecting nucleic acid sequences having both a first nucleic acid sequence type (e.g., from a first chromosome) and a second nucleic acid sequence type (e.g., from a second chromosome), the presence of the first and the second nucleic acid sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. In the method, immobilization of a first hybridization probe is used to isolate a first set of nucleic acids in the sample which contain the first nucleic acid sequence type. Immobilization of a second hybridization probe is then used to isolate a second set of nucleic acids from within the first set of nucleic acids which contain the second nucleic acid sequence type. The second set of nucleic acids are then detected, their presence indicating the presence of a nucleic acid sequence aberration. 14 figs.

  17. Detection of spurious interruptions of protein-coding regions in cloned cDNA sequences by GeneMark analysis.

    PubMed

    Hirosawa, M; Ishikawa, K; Nagase, T; Ohara, O

    2000-09-01

    cDNA is an artificial copy of mRNA and, therefore, no cDNA can be completely free from suspicion of cloning errors. Because overlooking these cloning errors results in serious misinterpretation of cDNA sequences, development of an alerting system targeting spurious sequences in cloned cDNAs is an urgent requirement for massive cDNA sequence analysis. We describe here the application of a modified GeneMark program, originally designed for prokaryotic gene finding, for detection of artifacts in cDNA clones. This program serves to provide a warning when any spurious split of protein-coding regions is detected through statistical analysis of cDNA sequences based on Markov models. In this study, 817 cDNA sequences deposited in public databases by us were subjected to analysis using this alerting system to assess its sensitivity and specificity. The results indicated that any spurious split of protein-coding regions in cloned cDNAs could be sensitively detected and systematically revised by means of this system after the experimental validation of the alerts. Furthermore, this study offered us, for the first time, statistical data regarding the rates and types of errors causing protein-coding splits in cloned cDNAs obtained by conventional cloning methods. PMID:10984451

  18. HybPiper: Extracting coding sequence and introns for phylogenetics from high-throughput sequencing reads using target enrichment1

    PubMed Central

    Johnson, Matthew G.; Gardner, Elliot M.; Liu, Yang; Medina, Rafael; Goffinet, Bernard; Shaw, A. Jonathan; Zerega, Nyree J. C.; Wickett, Norman J.

    2016-01-01

    Premise of the study: Using sequence data generated via target enrichment for phylogenetics requires reassembly of high-throughput sequence reads into loci, presenting a number of bioinformatics challenges. We developed HybPiper as a user-friendly platform for assembly of gene regions, extraction of exon and intron sequences, and identification of paralogous gene copies. We test HybPiper using baits designed to target 333 phylogenetic markers and 125 genes of functional significance in Artocarpus (Moraceae). Methods and Results: HybPiper implements parallel execution of sequence assembly in three phases: read mapping, contig assembly, and target sequence extraction. The pipeline was able to recover nearly complete gene sequences for all genes in 22 species of Artocarpus. HybPiper also recovered more than 500 bp of nontargeted intron sequence in over half of the phylogenetic markers and identified paralogous gene copies in Artocarpus. Conclusions: HybPiper was designed for Linux and Mac OS X and is freely available at https://github.com/mossmatters/HybPiper. PMID:27437175

  19. tax and rex Sequences of bovine leukaemia virus from globally diverse isolates: rex amino acid sequence more variable than tax.

    PubMed

    McGirr, K M; Buehring, G C

    2005-02-01

    Bovine leukaemia virus (BLV) is an important agricultural problem with high costs to the dairy industry. Here, we examine the variation of the tax and rex genes of BLV. The tax and rex genes share 420 bases and have overlapping reading frames. The tax gene encodes a protein that functions as a transactivator of the BLV promoter, is required for viral replication, acts on cellular promoters, and is responsible for oncogenesis. The rex facilitates the export of viral mRNAs from the nucleus and regulates transcription. We have sequenced five new isolates of the tax/rex gene. We examined the five new and three previously published tax/rex DNA and predicted amino acid sequences of BLV isolates from cattle in representative regions worldwide. The highest variation among nucleic acid sequences for tax and rex was 7% and 5%, respectively; among predicted amino acid sequences for Tax and Rex, 9% and 11%, respectively. Significantly more nucleotide changes resulted in predicted amino acid changes in the rex gene than in the tax gene (P < or = 0.0006). This variability is higher than previously reported for any region of the viral genome. This research may also have implications for the development of Tax-based vaccines. PMID:15702995

  20. The amino acid sequence of protein CM-3 from Dendroaspis polylepis polylepis (black mamba) venom.

    PubMed

    Joubert, F J

    1985-01-01

    Protein CM-3 from Dendroaspis polylepis polylepis venom was purified by gel filtration and ion exchange chromatography. It comprises 65 amino acids including eight half-cystines. The complete amino acid sequence of protein CM-3 has been elucidated. The sequence (residues 1-50) resembles that of the N-terminal sequence of the subunits of a synergistic type protein and residues 51-65 that of the C-terminal sequence of an angusticeps type protein. Mixtures of protein CM-3 and angusticeps type proteins showed no apparent synergistic effect, in that their toxicity in combination was no greater than the sum of their individual toxicities. PMID:4029488

  1. Analysis of human follistatin structure: identification of two discontinuous N-terminal sequences coding for activin A binding and structural consequences of activin binding to native proteins.

    PubMed

    Wang, Q; Keutmann, H T; Schneyer, A L; Sluss, P M

    2000-09-01

    A primary physiological function of follistatin is the binding and neutralization of activin, a transforming growth factor-beta family growth factor, and loss of function mutations are lethal. Despite the critical biological importance of follistatin's neutralization of activin, the structural basis of activin's binding to follistatin is poorly understood. The purposes of these studies were 1) to identify the primary sequence(s) within the N-terminal domain of the follistatin coding for activin binding, and 2) to determine whether activin binding to the native protein causes changes in other structural domains of follistatin. Synthetic peptide mimotopes identified within a 63-residue N-terminal domain two discontinuous sequences capable of binding labeled activin A. The first is located in a region (amino acids 3-26) of follistatin, a site previously identified by directed mutagenesis as important for activin binding. The second epitope, predicted to be located between amino acids 46 and 59, is newly identified. Although the sequences 3-26 and 46-59 code for activin binding, native follistatin only binds activin if disulfide bonding is intact. Furthermore, pyridylethylation of Cys residues followed by N-terminal sequencing and amino acid analysis revealed that all of the Cys residues in follistatin are involved in disulfide bonds and lack reactive free sulfhydryl groups. Specific ligands were used to probe the structural effects of activin binding on the other domains of the full-length molecule, comprised largely of the three 10-Cys follistatin module domains. No effects on ligand binding to follistatin-like module I or II were observed after the binding of activin A to native protein. In contrast, activin binding diminished recognition of domain III and enhanced that of the C domain by their respective monoclonal antibody probes, indicating an alteration of the antigenic structures of these regions. Thus, subsequent to activin binding, interactions are likely to

  2. A novel all-optical label processing based on multiple optical orthogonal codes sequences for optical packet switching networks

    NASA Astrophysics Data System (ADS)

    Zhang, Chongfu; Qiu, Kun; Xu, Bo; Ling, Yun

    2008-05-01

    This paper proposes an all-optical label processing scheme that uses the multiple optical orthogonal codes sequences (MOOCS)-based optical label for optical packet switching (OPS) (MOOCS-OPS) networks. In this scheme, each MOOCS is a permutation or combination of the multiple optical orthogonal codes (MOOC) selected from the multiple-groups optical orthogonal codes (MGOOC). Following a comparison of different optical label processing (OLP) schemes, the principles of MOOCS-OPS network are given and analyzed. Firstly, theoretical analyses are used to prove that MOOCS is able to greatly enlarge the number of available optical labels when compared to the previous single optical orthogonal code (SOOC) for OPS (SOOC-OPS) network. Then, the key units of the MOOCS-based optical label packets, including optical packet generation, optical label erasing, optical label extraction and optical label rewriting etc., are given and studied. These results are used to verify that the proposed MOOCS-OPS scheme is feasible.

  3. NCAD, a database integrating the intrinsic conformational preferences of non-coded amino acids

    PubMed Central

    Revilla-López, Guillem; Torras, Juan; Curcó, David; Casanovas, Jordi; Calaza, M. Isabel; Zanuy, David; Jiménez, Ana I.; Cativiela, Carlos; Nussinov, Ruth; Grodzinski, Piotr; Alemán, Carlos

    2010-01-01

    Peptides and proteins find an ever-increasing number of applications in the biomedical and materials engineering fields. The use of non-proteinogenic amino acids endowed with diverse physicochemical and structural features opens the possibility to design proteins and peptides with novel properties and functions. Moreover, non-proteinogenic residues are particularly useful to control the three-dimensional arrangement of peptidic chains, which is a crucial issue for most applications. However, information regarding such amino acids –also called non-coded, non-canonical or non-standard– is usually scattered among publications specialized in quite diverse fields as well as in patents. Making all these data useful to the scientific community requires new tools and a framework for their assembly and coherent organization. We have successfully compiled, organized and built a database (NCAD, Non-Coded Amino acids Database) containing information about the intrinsic conformational preferences of non-proteinogenic residues determined by quantum mechanical calculations, as well as bibliographic information about their synthesis, physical and spectroscopic characterization, conformational propensities established experimentally, and applications. The architecture of the database is presented in this work together with the first family of non-coded residues included, namely, α-tetrasubstituted α-amino acids. Furthermore, the NCAD usefulness is demonstrated through a test-case application example. PMID:20455555

  4. The Chinese hamster Alu-equivalent sequence: a conserved highly repetitious, interspersed deoxyribonucleic acid sequence in mammals has a structure suggestive of a transposable element.

    PubMed Central

    Haynes, S R; Toomey, T P; Leinwand, L; Jelinek, W R

    1981-01-01

    A consensus sequence has been determined for a major interspersed deoxyribonucleic acid repeat in the genome of Chinese hamster ovary cells (CHO cells). This sequence is extensively homologous to (i) the human Alu sequence (P. L. Deininger et al., J. Mol. Biol., in press), (ii) the mouse B1 interspersed repetitious sequence (Krayev et al., Nucleic Acids Res. 8:1201-1215, 1980) (iii) an interspersed repetitious sequence from African green monkey deoxyribonucleic acid (Dhruva et al., Proc. Natl. Acad. Sci. U.S.A. 77:4514-4518, 1980) and (iv) the CHO and mouse 4.5S ribonucleic acid (this report; F. Harada and N. Kato, Nucleic Acids Res. 8:1273-1285, 1980). Because the CHO consensus sequence shows significant homology to the human Alu sequence it is termed the CHO Alu-equivalent sequence. A conserved structure surrounding CHO Alu-equivalent family members can be recognized. It is similar to that surrounding the human Alu and the mouse B1 sequences, and is represented as follows: direct repeat-CHO-Alu-A-rich sequence-direct repeat. A composite interspersed repetitious sequence has been identified. Its structure is represented as follows: direct repeat-residue 47 to 107 of CHO-Alu-non-Alu repetitious sequence-A-rich sequence-direct repeat. Because the Alu flanking sequences resemble those that flank known transposable elements, we think it likely that the Alu sequence dispersed throughout the mammalian genome by transposition. Images PMID:9279371

  5. Design and performance of Huffman sequences in medical ultrasound coded excitation.

    PubMed

    Polpetta, Alessandro; Banelli, Paolo

    2012-04-01

    This paper deals with coded-excitation techniques for ultrasound medical echography. Specifically, linear Huffman coding is proposed as an alternative approach to other widely established techniques, such as complementary Golay coding and linear frequency modulation. The code design is guided by an optimization procedure that boosts the signal-to-noise ratio gain (GSNR) and, interestingly, also makes the code robust in pulsed-Doppler applications. The paper capitalizes on a thorough analytical model that can be used to design any linear coded-excitation system. This model highlights that the performance in frequency-dependent attenuating media mostly depends on the pulse-shaping waveform when the codes are characterized by almost ideal (i.e., Kronecker delta) autocorrelation. In this framework, different pulse shapers and different code lengths are considered to identify coded signals that optimize the contrast resolution at the output of the receiver pulse compression. Computer simulations confirm that the proposed Huffman codes are particularly effective, and that there are scenarios in which they may be preferable to the other established approaches, both in attenuating and non-attenuating media. Specifically, for a single scatterer at 150 mm in a 0.7-dB/(MHz·cm) attenuating medium, the proposed Huffman design achieves a main-to-side lobe ratio (MSR) equal to 65 dB, whereas tapered linear frequency modulation and classical complementary Golay codes achieve 35 and 45 dB, respectively. PMID:22547275

  6. Molecular cloning of the goose ACSL3 and ACSL5 coding domain sequences and their expression characteristics during goose fatty liver development.

    PubMed

    He, H; Liu, H H; Wang, J W; Lv, J; Li, L; Pan, Z X

    2014-01-01

    It has been demonstrated that ACSL3 and ACSL5 play important roles in fat metabolism. To investigate the primary functions of ACSL3 and ACSL5 and to evaluate their expression levels during goose fatty liver development, we cloned the ACSL3 and ACSL5 coding domain sequences (CDSs) of geese using RT-PCR and analyzed their expression characteristics under different conditions using qRT-PCR. The results showed that the goose ACSL3 (JX511975) and ACSL5 (JX511976) sequences have high similarities with the chicken sequences both at the nucleotide and amino acid levels. Both ACSL3 and ACSL5 have high expression levels in goose liver. The expression levels of ACSL3 and ACSL5 in goose liver and hepatocytes can be changed by overfeeding geese and by treatment with unsaturated fatty acids, respectively. Together, these results indicate that ACSL3 and ACSL5 play important roles during fatty liver development. The different expression characteristics of goose ACSL3 and ACSL5 suggest that these two genes may be responsible for specific functions. PMID:24469710

  7. Computer Simulation of the Determination of Amino Acid Sequences in Polypeptides

    ERIC Educational Resources Information Center

    Daubert, Stephen D.; Sontum, Stephen F.

    1977-01-01

    Describes a computer program that generates a random string of amino acids and guides the student in determining the correct sequence of a given protein by using experimental analytic data for that protein. (MLH)

  8. SHARAKU: an algorithm for aligning and clustering read mapping profiles of deep sequencing in non-coding RNA processing

    PubMed Central

    Tsuchiya, Mariko; Amano, Kojiro; Abe, Masaya; Seki, Misato; Hase, Sumitaka; Sato, Kengo; Sakakibara, Yasubumi

    2016-01-01

    Motivation: Deep sequencing of the transcripts of regulatory non-coding RNA generates footprints of post-transcriptional processes. After obtaining sequence reads, the short reads are mapped to a reference genome, and specific mapping patterns can be detected called read mapping profiles, which are distinct from random non-functional degradation patterns. These patterns reflect the maturation processes that lead to the production of shorter RNA sequences. Recent next-generation sequencing studies have revealed not only the typical maturation process of miRNAs but also the various processing mechanisms of small RNAs derived from tRNAs and snoRNAs. Results: We developed an algorithm termed SHARAKU to align two read mapping profiles of next-generation sequencing outputs for non-coding RNAs. In contrast with previous work, SHARAKU incorporates the primary and secondary sequence structures into an alignment of read mapping profiles to allow for the detection of common processing patterns. Using a benchmark simulated dataset, SHARAKU exhibited superior performance to previous methods for correctly clustering the read mapping profiles with respect to 5′-end processing and 3′-end processing from degradation patterns and in detecting similar processing patterns in deriving the shorter RNAs. Further, using experimental data of small RNA sequencing for the common marmoset brain, SHARAKU succeeded in identifying the significant clusters of read mapping profiles for similar processing patterns of small derived RNA families expressed in the brain. Availability and Implementation: The source code of our program SHARAKU is available at http://www.dna.bio.keio.ac.jp/sharaku/, and the simulated dataset used in this work is available at the same link. Accession code: The sequence data from the whole RNA transcripts in the hippocampus of the left brain used in this work is available from the DNA DataBank of Japan (DDBJ) Sequence Read Archive (DRA) under the accession number DRA

  9. Accuracy of sequence alignment and fold assessment using reduced amino acid alphabets.

    PubMed

    Melo, Francisco; Marti-Renom, Marc A

    2006-06-01

    Reduced or simplified amino acid alphabets group the 20 naturally occurring amino acids into a smaller number of representative protein residues. To date, several reduced amino acid alphabets have been proposed, which have been derived and optimized by a variety of methods. The resulting reduced amino acid alphabets have been applied to pattern recognition, generation of consensus sequences from multiple alignments, protein folding, and protein structure prediction. In this work, amino acid substitution matrices and statistical potentials were derived based on several reduced amino acid alphabets and their performance assessed in a large benchmark for the tasks of sequence alignment and fold assessment of protein structure models, using as a reference frame the standard alphabet of 20 amino acids. The results showed that a large reduction in the total number of residue types does not necessarily translate into a significant loss of discriminative power for sequence alignment and fold assessment. Therefore, some definitions of a few residue types are able to encode most of the relevant sequence/structure information that is present in the 20 standard amino acids. Based on these results, we suggest that the use of reduced amino acid alphabets may allow to increasing the accuracy of current substitution matrices and statistical potentials for the prediction of protein structure of remote homologs. PMID:16506243

  10. Characterization of mouse cellular deoxyribonucleic acid homologous to Abelson murine leukemia virus-specific sequences.

    PubMed Central

    Dale, B; Ozanne, B

    1981-01-01

    The genome of Abelson murine leukemia virus (A-MuLV) consists of sequences derived from both BALB/c mouse deoxyribonucleic acid and the genome of Moloney murine leukemia virus. Using deoxyribonucleic acid linear intermediates as a source of retroviral deoxyribonucleic acid, we isolated a recombinant plasmid which contained 1.9 kilobases of the 3.5-kilobase mouse-derived sequences found in A-MuLV (A-MuLV-specific sequences). We used this clone, designated pSA-17, as a probe restriction enzyme and Southern blot analyses to examine the arrangement of homologous sequences in BALB/c deoxyribonucleic acid (endogenous Abelson sequences). The endogenous Abelson sequences within the mouse genome were interrupted by noncoding regions, suggesting that a rearrangement of the cell sequences was required to produce the sequence found in the virus. Endogenous Abelson sequences were arranged similarly in mice that were susceptible to A-MuLV tumors and in mice that were resistant to A-MuLV tumors. An examination of three BALB/c plasmacytomas and a BALB/c early B-cell tumor likewise revealed no alteration in the arrangement of the endogenous Abelson sequences. Homology to pSA-17 was also observed in deoxyribonucleic acids prepared from rat, hamster, chicken, and human cells. An isolate of A-MuLV which encoded a 160,000-dalton transforming protein (P160) contained 700 more base pairs of mouse sequences than the standard A-MuLV isolate, which encoded a 120,000-dalton transforming protein (P120). Images PMID:9279386

  11. The amino acid sequence of monal pheasant lysozyme and its activity.

    PubMed

    Araki, T; Matsumoto, T; Torikata, T

    1998-10-01

    The amino acid sequence of monal pheasant lysozyme and its activity were analyzed. Carboxymethylated lysozyme was digested with trypsin and the resulting peptides were sequenced. The established amino acid sequence had one amino acid substitution at position 102 (Arg to Gly) comparing with Indian peafowl lysozyme and four amino acid substitutions at positions 3 (Phe to Tyr), 15 (His to Leu), 41 (Gln to His), and 121 (Gln to His) with chicken lysozyme. Analysis of the time-courses of reaction using N-acetylglucosamine pentamer as a substrate showed a difference of binding free energy change (-0.4 kcal/mol) at subsites A between monal pheasant and Indian peafowl lysozyme. This was assumed to be caused by the amino acid substitution at subsite A with loss of a positive charge at position 102 (Arg102 to Gly). PMID:9836434

  12. Studies on monotreme proteins. VII. Amino acid sequence of myoglobin from the platypus, Ornithoryhynchus anatinus.

    PubMed

    Fisher, W K; Thompson, E O

    1976-03-01

    Myoglobin isolated from skeletal muscle of the platypus contains 153 amino acid residues. The complete amino acid sequence has been determined following cleavage with cyanogen bromide and further digestion of the four fragments with trypsin, chymotrypsin, pepsin and thermolysin. Sequences of the purified peptides were determined by the dansyl-Edman procedure. The amino acid sequence showed 25 differences from human myoglobin and 24 from kangaroo myoglobin. Amino acid sequences in myoglobins are more conserved than sequences in the alpha- and beta-globin chains, and platypus myoglobin shows a similar number of variations in sequence to kangaroo myoglobin when compared with myoglobin of other species. The date of divergence of the platypus from other mammals was estimated at 102 +/- 31 million years, based on the number of amino acid differences between species and allowing for mutations during the evolutionary period. This estimate differs widely from the estimate given by similar treatment of the alpha- and beta-chain sequences and a constant rate of mutation of globin chains is not supported. PMID:962722

  13. cDNA-derived amino acid sequences of myoglobins from nine species of whales and dolphins.

    PubMed

    Iwanami, Kentaro; Mita, Hajime; Yamamoto, Yasuhiko; Fujise, Yoshihiro; Yamada, Tadasu; Suzuki, Tomohiko

    2006-10-01

    We determined the myoglobin (Mb) cDNA sequences of nine cetaceans, of which six are the first reports of Mb sequences: sei whale (Balaenoptera borealis), Bryde's whale (Balaenoptera edeni), pygmy sperm whale (Kogia breviceps), Stejneger's beaked whale (Mesoplodon stejnegeri), Longman's beaked whale (Indopacetus pacificus), and melon-headed whale (Peponocephala electra), and three confirm the previously determined chemical amino acid sequences: sperm whale (Physeter macrocephalus), common minke whale (Balaenoptera acutorostrata) and pantropical spotted dolphin (Stenella attenuata). We found two types of Mb in the skeletal muscle of pantropical spotted dolphin: Mb I with the same amino acid sequence as that deposited in the protein database, and Mb II, which differs at two amino acid residues compared with Mb I. Using an alignment of the amino acid or cDNA sequences of cetacean Mb, we constructed a phylogenetic tree by the NJ method. Clustering of cetacean Mb amino acid and cDNA sequences essentially follows the classical taxonomy of cetaceans, suggesting that Mb sequence data is valid for classification of cetaceans at least to the family level. PMID:16962803

  14. Molecular cloning, sequencing, and expression of omp-40, the gene coding for the major outer membrane protein from the acidophilic bacterium Thiobacillus ferrooxidans.

    PubMed

    Guiliani, N; Jerez, C A

    2000-06-01

    Thiobacillus ferrooxidans is one of the chemolithoautotrophic bacteria important in industrial biomining operations. Some of the surface components of this microorganism are probably involved in adaptation to their acidic environment and in bacterium-mineral interactions. We have isolated and characterized omp40, the gene coding for the major outer membrane protein from T. ferrooxidans. The deduced amino acid sequence of the Omp40 protein has 382 amino acids and a calculated molecular weight of 40,095.7. Omp40 forms an oligomeric structure of about 120 kDa that dissociates into the monomer (40 kDa) by heating in the presence of sodium dodecyl sulfate. The degree of identity of Omp40 amino acid sequence to porins from enterobacteria was only 22%. Nevertheless, multiple alignments of this sequence with those from several OmpC porins showed several important features conserved in the T. ferrooxidans surface protein, such as the approximate locations of 16 transmembrane beta strands, eight loops, including a large external L3 loop, and eight turns which allowed us to propose a putative 16-stranded beta-barrel porin structure for the protein. These results together with the previously known capacity of Omp40 to form ion channels in planar lipid bilayers strongly support its role as a porin in this chemolithoautotrophic acidophilic microorganism. Some characteristics of the Omp40 protein, such as the presence of a putative L3 loop with an estimated isoelectric point of 7.21 allow us to speculate that this can be the result of an adaptation of the acidophilic T. ferrooxidans to prevent free movement of protons across its outer membrane. PMID:10831405

  15. Cloning and sequence of a cDNA coding for the human beta-migrating endothelial-cell-type plasminogen activator inhibitor.

    PubMed Central

    Ny, T; Sawdey, M; Lawrence, D; Millan, J L; Loskutoff, D J

    1986-01-01

    A lambda gt11 expression library containing cDNA inserts prepared from human placental mRNA was screened immunologically using an antibody probe developed against the beta-migrating plasminogen activator inhibitor (beta-PAI) purified from cultured bovine aortic endothelial cells. Thirty-four positive clones were isolated after screening 7 X 10(5) phages. Three clones (lambda 1.2, lambda 3, and lambda 9.2) were randomly picked and further characterized. These contained inserts 1.9, 3.0, and 1.9 kilobases (kb) long, respectively. Escherichia coli lysogenic for lambda 9.2, but not for lambda gt11, produced a fusion protein of 180 kDa that was recognized by affinity-purified antibodies against the bovine aortic endothelial cell beta-PAI and had beta-PAI activity when analyzed by reverse fibrin autography. The largest cDNA insert was sequenced and shown to be 2944 base pairs (bp) long. It has a large 3' untranslated region [1788 bp, excluding the poly(A) tail] and contains the entire coding region of the mature protein but lacks the initiation codon and part of the signal peptide coding region at the 5' terminus. The two clones carrying the 1.9-kb cDNA inserts were partially sequenced and shown to be identical to the 3.0-kb cDNA except that they were truncated, lacking much of the 3' untranslated region. Blot hybridization analysis of electrophoretically fractionated RNA from the human fibrosarcoma cell line HT-1080 was performed using the 3.0-kb cDNA as hybridization probe. Two distinct transcripts, 2.2 and 3.0 kb, were detected, suggesting that the 1.9-kb cDNA may have been copied from the shorter RNA transcript. The amino acid sequence deduced from the cDNA was aligned with the NH2-terminal sequence of the human beta-PAI. Based on this alignment, the mature human beta-PAI is 379 amino acids long and contains an NH2-terminal valine. The deduced amino acid sequence has extensive (30%) homology with alpha 1-antitrypsin and antithrombin III, indicating that the beta

  16. [A comparison of the knockout efficiencies of two codon-optimized Cas9 coding sequences in zebrafish embryos].

    PubMed

    Fenghua, Zhang; Houpeng, Wang; Siyu, Huang; Feng, Xiong; Zuoyan, Zhu; Yonghua, Sun

    2016-02-01

    Recent years have witnessed the rapid development of the clustered regularly interspaced short palindromic repeats/CRISPR-associated protein(CRISPR/Cas9)system. In order to realize gene knockout with high efficiency and specificity in zebrafish, several labs have synthesized distinct Cas9 cDNA sequences which were cloned into different vectors. In this study, we chose two commonly used zebrafish-codon-optimized Cas9 coding sequences (zCas9_bz, zCas9_wc) from two different labs, and utilized them to knockout seven genes in zebrafish embryos, including the exogenous egfp and six endogenous genes (chd, hbegfa, th, eef1a1b, tyr and tcf7l1a). We compared the knockout efficiencies resulting from the two zCas9 coding sequences, by direct sequencing of PCR products, colony sequencing and phenotypic analysis. The results showed that the knockout efficiency of zCas9_wc was higher than that of zCas9_bz in all conditions. PMID:26907778

  17. New genes from non-coding sequence: the role of de novo protein-coding genes in eukaryotic evolutionary innovation

    PubMed Central

    McLysaght, Aoife; Guerzoni, Daniele

    2015-01-01

    The origin of novel protein-coding genes de novo was once considered so improbable as to be impossible. In less than a decade, and especially in the last five years, this view has been overturned by extensive evidence from diverse eukaryotic lineages. There is now evidence that this mechanism has contributed a significant number of genes to genomes of organisms as diverse as Saccharomyces, Drosophila, Plasmodium, Arabidopisis and human. From simple beginnings, these genes have in some instances acquired complex structure, regulated expression and important functional roles. New genes are often thought of as dispensable late additions; however, some recent de novo genes in human can play a role in disease. Rather than an extremely rare occurrence, it is now evident that there is a relatively constant trickle of proto-genes released into the testing ground of natural selection. It is currently unknown whether de novo genes arise primarily through an ‘RNA-first’ or ‘ORF-first’ pathway. Either way, evolutionary tinkering with this pool of genetic potential may have been a significant player in the origins of lineage-specific traits and adaptations. PMID:26323763

  18. Draft Genome Sequences of Two Novel Acidimicrobiaceae Members from an Acid Mine Drainage Biofilm Metagenome.

    PubMed

    Pinto, Ameet J; Sharp, Jonathan O; Yoder, Michael J; Almstrand, Robert

    2016-01-01

    Bacteria belonging to the family Acidimicrobiaceae are frequently encountered in heavy metal-contaminated acidic environments. However, their phylogenetic and metabolic diversity is poorly resolved. We present draft genome sequences of two novel and phylogenetically distinct Acidimicrobiaceae members assembled from an acid mine drainage biofilm metagenome. PMID:26769942

  19. Draft Genome Sequences of Two Novel Acidimicrobiaceae Members from an Acid Mine Drainage Biofilm Metagenome

    PubMed Central

    Pinto, Ameet J.; Sharp, Jonathan O.; Yoder, Michael J.

    2016-01-01

    Bacteria belonging to the family Acidimicrobiaceae are frequently encountered in heavy metal-contaminated acidic environments. However, their phylogenetic and metabolic diversity is poorly resolved. We present draft genome sequences of two novel and phylogenetically distinct Acidimicrobiaceae members assembled from an acid mine drainage biofilm metagenome. PMID:26769942

  20. Two Perspectives on the Origin of the Standard Genetic Code

    NASA Astrophysics Data System (ADS)

    Sengupta, Supratim; Aggarwal, Neha; Bandhu, Ashutosh Vishwa

    2014-12-01

    The origin of a genetic code made it possible to create ordered sequences of amino acids. In this article we provide two perspectives on code origin by carrying out simulations of code-sequence coevolution in finite populations with the aim of examining how the standard genetic code may have evolved from more primitive code(s) encoding a small number of amino acids. We determine the efficacy of the physico-chemical hypothesis of code origin in the absence and presence of horizontal gene transfer (HGT) by allowing a diverse collection of code-sequence sets to compete with each other. We find that in the absence of horizontal gene transfer, natural selection between competing codes distinguished by differences in the degree of physico-chemical optimization is unable to explain the structure of the standard genetic code. However, for certain probabilities of the horizontal transfer events, a universal code emerges having a structure that is consistent with the standard genetic code.

  1. Cortical and subcortical contributions to sequence retrieval: Schematic coding of temporal context in the neocortical recollection network.

    PubMed

    Hsieh, Liang-Tien; Ranganath, Charan

    2015-11-01

    Episodic memory entails the ability to remember what happened when. Although the available evidence indicates that the hippocampus plays a role in structuring serial order information during retrieval of event sequences, information processed in the hippocampus must be conveyed to other cortical and subcortical areas in order to guide behavior. However, the extent to which other brain regions contribute to the temporal organization of episodic memory remains unclear. Here, we examined multivoxel activity pattern changes during retrieval of learned and random object sequences, focusing on a neocortical "core recollection network" that includes the medial prefrontal cortex, retrosplenial cortex, and angular gyrus, as well as on striatal areas including the caudate nucleus and putamen that have been implicated in processing of sequence information. The results demonstrate that regions of the core recollection network carry information about temporal positions within object sequences, irrespective of object information. This schematic coding of temporal information is in contrast to the putamen, which carried information specific to objects in learned sequences, and the caudate, which carried information about objects, irrespective of sequence context. Our results suggest a role for the cortical recollection network in the representation of temporal structure of events during episodic retrieval, and highlight the possible mechanisms by which the striatal areas may contribute to this process. More broadly, the results indicate that temporal sequence retrieval is a useful paradigm for dissecting the contributions of specific brain regions to episodic memory. PMID:26209802

  2. Inferences from protein and nucleic acid sequences - Early molecular evolution, divergence of kingdoms and rates of change

    NASA Technical Reports Server (NTRS)

    Dayhoff, M. O.; Barker, W. C.; Mclaughlin, P. J.

    1974-01-01

    Description of new sensitive, objective methods for establishing the probable common ancestry of very distantly related sequences and the quantitative evolutionary change which has taken place. These methods are applied to four families of proteins and nucleic acids and evolutionary trees will be derived where possible. Of the three families containing duplications of genetic material, two are nucleic acids: transfer RNA and 5S ribosomal RNA. Both of these structures are functional in the synthesis of coded proteins, and prototypes must have been present in the cell at the inception of the fundamental coding process that all living things share. There are many types of tRNA which recognize the various nucleotide triplets and the 20 amino acids. These types are thought to have arisen as a result of many gene duplications. Relationships among these types are discussed. The 5S ribosomal RNA, presently functional in both eukaryotes and prokaryotes, is very likely descended from an early form incorporating almost a complete duplication of genetic material. The amount of evolution in the various lines can again be compared. The other two families containing duplications are proteins; ferredoxin and cytochrome c.

  3. Nucleotide sequence of the fadR gene, a multifunctional regulator of fatty acid metabolism in Escherichia coli.

    PubMed Central

    DiRusso, C C

    1988-01-01

    The Escherichia coli fadR gene is a multifunctional regulator of fatty acid and acetate metabolism. In the present work the nucleotide sequence of the 1.3 kb DNA fragment which encodes FadR has been determined. The coding sequence of the fadR gene is 714 nucleotides long and is preceded by a typical E. coli ribosome binding site and is followed by a sequence predicted to be sufficient for factor-independent chain termination. Primer extension experiments demonstrated that the transcription of the fadR gene initiates with an adenine nucleotide 33 nucleotides upstream from the predicted start of translation. The derived fadR peptide has a calculated molecular weight of 26,972. This is in reasonable agreement with the apparent molecular weight of 29,000 previously estimated on the basis of maxi-cell analysis of plasmid encoded proteins. There is a segment of twenty amino acids within the predicted peptide which resembles the DNA recognition and binding site of many transcriptional regulatory proteins. Images PMID:2843809

  4. Long Non-Coding RNA and Alternative Splicing Modulations in Parkinson's Leukocytes Identified by RNA Sequencing

    PubMed Central

    Soreq, Lilach; Guffanti, Alessandro; Salomonis, Nathan; Simchovitz, Alon; Israel, Zvi; Bergman, Hagai; Soreq, Hermona

    2014-01-01

    The continuously prolonged human lifespan is accompanied by increase in neurodegenerative diseases incidence, calling for the development of inexpensive blood-based diagnostics. Analyzing blood cell transcripts by RNA-Seq is a robust means to identify novel biomarkers that rapidly becomes a commonplace. However, there is lack of tools to discover novel exons, junctions and splicing events and to precisely and sensitively assess differential splicing through RNA-Seq data analysis and across RNA-Seq platforms. Here, we present a new and comprehensive computational workflow for whole-transcriptome RNA-Seq analysis, using an updated version of the software AltAnalyze, to identify both known and novel high-confidence alternative splicing events, and to integrate them with both protein-domains and microRNA binding annotations. We applied the novel workflow on RNA-Seq data from Parkinson's disease (PD) patients' leukocytes pre- and post- Deep Brain Stimulation (DBS) treatment and compared to healthy controls. Disease-mediated changes included decreased usage of alternative promoters and N-termini, 5′-end variations and mutually-exclusive exons. The PD regulated FUS and HNRNP A/B included prion-like domains regulated regions. We also present here a workflow to identify and analyze long non-coding RNAs (lncRNAs) via RNA-Seq data. We identified reduced lncRNA expression and selective PD-induced changes in 13 of over 6,000 detected leukocyte lncRNAs, four of which were inversely altered post-DBS. These included the U1 spliceosomal lncRNA and RP11-462G22.1, each entailing sequence complementarity to numerous microRNAs. Analysis of RNA-Seq from PD and unaffected controls brains revealed over 7,000 brain-expressed lncRNAs, of which 3,495 were co-expressed in the leukocytes including U1, which showed both leukocyte and brain increases. Furthermore, qRT-PCR validations confirmed these co-increases in PD leukocytes and two brain regions, the amygdala and substantia

  5. Two distinct ferredoxins from Rhodobacter capsulatus: complete amino acid sequences and molecular evolution.

    PubMed

    Saeki, K; Suetsugu, Y; Yao, Y; Horio, T; Marrs, B L; Matsubara, H

    1990-09-01

    Two distinct ferredoxins were purified from Rhodobacter capsulatus SB1003. Their complete amino acid sequences were determined by a combination of protease digestion, BrCN cleavage and Edman degradation. Ferredoxins I and II were composed of 64 and 111 amino acids, respectively, with molecular weights of 6,728 and 12,549 excluding iron and sulfur atoms. Both contained two Cys clusters in their amino acid sequences. The first cluster of ferredoxin I and the second cluster of ferredoxin II had a sequence, CxxCxxCxxxCP, in common with the ferredoxins found in Clostridia. The second cluster of ferredoxin I had a sequence, CxxCxxxxxxxxCxxxCM, with extra amino acids between the second and third Cys, which has been reported for other photosynthetic bacterial ferredoxins and putative ferredoxins (nif-gene products) from nitrogen-fixing bacteria, and with a unique occurrence of Met. The first cluster of ferredoxin II had a CxxCxxxxCxxxCP sequence, with two additional amino acids between the second and third Cys, a characteristics feature of Azotobacter-[3Fe-4S] [4Fe-4S]-ferredoxin. Ferredoxin II was also similar to Azotobacter-type ferredoxins with an extended carboxyl (C-) terminal sequence compared to the common Clostridium-type. The evolutionary relationship of the two together with a putative one recently found to be encoded in nifENXQ region in this bacterium [Moreno-Vivian et al. (1989) J. Bacteriol. 171, 2591-2598] is discussed. PMID:2277040

  6. DNA sequence of the control region of phage D108: the N-terminal amino acid sequences of repressor and transposase are similar both in phage D108 and in its relative, phage Mu.

    PubMed Central

    Mizuuchi, M; Weisberg, R A; Mizuuchi, K

    1986-01-01

    We have determined the DNA sequence of the control region of phage D108 up to position 1419 at the left end of the phage genome. Open reading frames for the repressor gene, ner gene, and the 5' part of the A gene (which codes for transposase) are found in the sequence. The genetic organization of this region of phage D108 is quite similar to that of phage Mu in spite of considerable divergence, both in the nucleotide sequence and in the amino acid sequences of the regulatory proteins of the two phages. The N-terminal amino acid sequences of the transposases of the two phages also share only limited homology. On the other hand, a significant amino acid sequence homology was found within each phage between the N-terminal parts of the repressor and transposase. We propose that the N-terminal domains of the repressor and transposase of each phage interact functionally in the process of making the decision between the lytic and the lysogenic mode of growth. PMID:3012481

  7. Amino Acid Sequence of Anionic Peroxidase from the Windmill Palm Tree Trachycarpus fortunei

    PubMed Central

    2015-01-01

    Palm peroxidases are extremely stable and have uncommon substrate specificity. This study was designed to fill in the knowledge gap about the structures of a peroxidase from the windmill palm tree Trachycarpus fortunei. The complete amino acid sequence and partial glycosylation were determined by MALDI-top-down sequencing of native windmill palm tree peroxidase (WPTP), MALDI-TOF/TOF MS/MS of WPTP tryptic peptides, and cDNA sequencing. The propeptide of WPTP contained N- and C-terminal signal sequences which contained 21 and 17 amino acid residues, respectively. Mature WPTP was 306 amino acids in length, and its carbohydrate content ranged from 21% to 29%. Comparison to closely related royal palm tree peroxidase revealed structural features that may explain differences in their substrate specificity. The results can be used to guide engineering of WPTP and its novel applications. PMID:25383699

  8. Protein chemotaxonomy. XIII. Amino acid sequence of ferredoxin from Panax ginseng.

    PubMed

    Mino, Yoshiki

    2006-08-01

    The complete amino acid sequence of [2Fe-2S] ferredoxin from Panax ginseng (Araliaceae) has been determined by automated Edman degradation of the entire S-carboxymethylcysteinyl protein and of the peptides obtained by enzymatic digestion. This ferredoxin has a unique amino acid sequence, which includes an insertion of Tyr at the 3rd position from the amino-terminus and a deletion of two amino acid residues at the carboxyl terminus. This ferredoxin had 18 differences in its amino acid sequence compared to that of Petroselinum sativum (Umbelliferae). In contrast, 23-33 differences were observed compared to other dicotyledonous plants. This suggests that Panax ginseng is related taxonomically to umbelliferous plants. PMID:16880642

  9. Complete amino acid sequence and structure characterization of the taste-modifying protein, miraculin.

    PubMed

    Theerasilp, S; Hitotsuya, H; Nakajo, S; Nakaya, K; Nakamura, Y; Kurihara, Y

    1989-04-25

    The taste-modifying protein, miraculin, has the unusual property of modifying sour taste into sweet taste. The complete amino acid sequence of miraculin purified from miracle fruits by a newly developed method (Theerasilp, S., and Kurihara, Y. (1988) J. Biol. Chem. 263, 11536-11539) was determined by an automatic Edman degradation method. Miraculin was a single polypeptide with 191 amino acid residues. The calculated molecular weight based on the amino acid sequence and the carbohydrate content (13.9%) was 24,600. Asn-42 and Asn-186 were linked N-glycosidically to carbohydrate chains. High homology was found between the amino acid sequences of miraculin and soybean trypsin inhibitor. PMID:2708331

  10. N-terminal sequence of amino acids and some properties of an acid-stable alpha-amylase from citric acid-koji (Aspergillus usamii var.).

    PubMed

    Suganuma, T; Tahara, N; Kitahara, K; Nagahama, T; Inuzuka, K

    1996-01-01

    An acid-stable alpha-amylase (AA) was purified from an acidic extract of citric acid-koji (A. usamii var.). The N-terminal sequence of the first 20 amino acids of the enzyme was identical with that of AA from A. niger, but the two enzymes differed in molecular weight. HPLC analysis for identifying the anomers of products indicated that the AA hydrolyzed maltopentaose (G5) at the third glycoside bond predominantly, which differed from Taka-amylase A and the neutral alpha-amylase (NA) from the citric acid-koji. PMID:8824843

  11. Genome defense against exogenous nucleic acids in eukaryotes by non-coding DNA occurs through CRISPR-like mechanisms in the cytosol and the bodyguard protection in the nucleus.

    PubMed

    Qiu, Guo-Hua

    2016-01-01

    In this review, the protective function of the abundant non-coding DNA in the eukaryotic genome is discussed from the perspective of genome defense against exogenous nucleic acids. Peripheral non-coding DNA has been proposed to act as a bodyguard that protects the genome and the central protein-coding sequences from ionizing radiation-induced DNA damage. In the proposed mechanism of protection, the radicals generated by water radiolysis in the cytosol and IR energy are absorbed, blocked and/or reduced by peripheral heterochromatin; then, the DNA damage sites in the heterochromatin are removed and expelled from the nucleus to the cytoplasm through nuclear pore complexes, most likely through the formation of extrachromosomal circular DNA. To strengthen this hypothesis, this review summarizes the experimental evidence supporting the protective function of non-coding DNA against exogenous nucleic acids. Based on these data, I hypothesize herein about the presence of an additional line of defense formed by small RNAs in the cytosol in addition to their bodyguard protection mechanism in the nucleus. Therefore, exogenous nucleic acids may be initially inactivated in the cytosol by small RNAs generated from non-coding DNA via mechanisms similar to the prokaryotic CRISPR-Cas system. Exogenous nucleic acids may enter the nucleus, where some are absorbed and/or blocked by heterochromatin and others integrate into chromosomes. The integrated fragments and the sites of DNA damage are removed by repetitive non-coding DNA elements in the heterochromatin and excluded from the nucleus. Therefore, the normal eukaryotic genome and the central protein-coding sequences are triply protected by non-coding DNA against invasion by exogenous nucleic acids. This review provides evidence supporting the protective role of non-coding DNA in genome defense. PMID:27036064

  12. Genomic Locations of Conserved Noncoding Sequences and Their Proximal Protein-Coding Genes in Mammalian Expression Dynamics.

    PubMed

    Babarinde, Isaac Adeyemi; Saitou, Naruya

    2016-07-01

    Experimental studies have found the involvement of certain conserved noncoding sequences (CNSs) in the regulation of the proximal protein-coding genes in mammals. However, reported cases of long range enhancer activities and inter-chromosomal regulation suggest that proximity of CNSs to protein-coding genes might not be important for regulation. To test the importance of the CNS genomic location, we extracted the CNSs conserved between chicken and four mammalian species (human, mouse, dog, and cattle). These CNSs were confirmed to be under purifying selection. The intergenic CNSs are often found in clusters in gene deserts, where protein-coding genes are in paucity. The distribution pattern, ChIP-Seq, and RNA-Seq data suggested that the CNSs are more likely to be regulatory elements and not corresponding to long intergenic noncoding RNAs. Physical distances between CNS and their nearest protein coding genes were well conserved between human and mouse genomes, and CNS-flanking genes were often found in evolutionarily conserved genomic neighborhoods. ChIP-Seq signal and gene expression patterns also suggested that CNSs regulate nearby genes. Interestingly, genes with more CNSs have more evolutionarily conserved expression than those with fewer CNSs. These computationally obtained results suggest that the genomic locations of CNSs are important for their regulatory functions. In fact, various kinds of evolutionary constraints may be acting to maintain the genomic locations of CNSs and protein-coding genes in mammals to ensure proper regulation. PMID:27017584

  13. Amino Acid Coding Bias of the Hypersaline Dead Sea on an Environmental Scale

    NASA Astrophysics Data System (ADS)

    Rhodes, M. E.; Fitz-Gibbon, S.; Bodaker, I.; Beja, O.; Oren, A.; House, C.

    2008-12-01

    Metagenomic approaches can offer a broad overview of the microbial diversity in and environment and the metabolic processes performed within. At the most general level, knowing merely the GC content of an environment is enough to yield valuable insights as to the makeup of a microbial community. It has been documented that various environmental stresses, such as extreme acidity or salinity, can alter the usage of amino acids within members of an ecosystem. Here we explore the proportion of amino acids encoded within a variety of metagenomes including microbiomes from the human gut, the deep sea subsurface, acid mines, and the Dead Sea. Our primary focus is on strategies employed by hyperhalophiles to cope with the multimolar salinities of their environments. One of the approaches, used by archaea of the order Halobacteriales , as well as by a limited number of halophilc Bacteria is to accumulate comparable salt concentrations within their cytoplasm. It has been shown within individual species that the cytoplasmic proteins must then be modified in order to maintain their functionality. The changes include an overall increase in acidic amino acids coupled to a decrease in basic amino acids and a decrease in hydrophobic amino acids compensated for by an increase in the borderline hydrophobic amino acids Ser and Thr. We observed these trends within all fully sequenced hyperhalophilic Archaea and two distinct Dead Sea metagenomes (1992 and 2007). Additonally, the ratio of acidic to basic amino acids in the Dead Sea increased between the years 1992 and 2007, from 1.55 to 1.83. This corresponds to an increase of salinity of approximately 30 percent (from 270 ppt to 350 ppt) over the same time period. The shift in ratio of acidic to basic amino acids was not just observable in the metagenome as a whole and the archaeal subpopulation but was also pronounced in the bacterial subpopulation, from 1.27 to 1.62. This shift seems to indicate a restriction of the community from a

  14. Cloning and sequence analysis of cDNA coding for a lectin from Helianthus tuberosus callus and its jasmonate-induced expression.

    PubMed

    Nakagawa, R; Yasokawa, D; Okumura, Y; Nagashima, K

    2000-06-01

    Two lectins (designated as HTA I and HTA II) that seemed to be isolectins were found in Helianthus tuberosus callus. cDNA encoding HTA I was isolated from a ZAP Express expression library by immunoselection by using the anti-HTA antiserum. The sequence of this cDNA consisted of 432 bp nucleotides coding for a polypeptide of 143 amino acid residues (Mr, 15,314). When introduced into E. coli, the cDNA directed the synthesis of active HTA I as indicated by the hemagglutination activity. The deduced amino acid sequence showed homology with some lectins and jasmonate-induced proteins. When callus was cultured in the presence of methyl jasmonate (MeJA), the hemagglutination activity increased in a dose-dependent manner. The levels of expression of the HTA protein and of the corresponding mRNA also increased in the treated callus. In view of these results, HTA I is considered to be a jasmonate-induced protein. PMID:10923797

  15. Amino acids and our genetic code: a highly adaptive and interacting defense system.

    PubMed

    Verheesen, R H; Schweitzer, C M

    2012-04-01

    Since the discovery of the genetic code, Mendel's heredity theory and Darwin's evolution theory, science believes that adaptations to the environment are processes in which the adaptation of the genes is a matter of probability, in which finally the specie will survive which is evolved by chance. We hypothesize that evolution and the adaptation of the genes is a well-organized fully adaptive system in which there is no rigidity of the genes. The dividing of the genes will take place in line with the environment to be expected, sensed through the mother. The encoding triplets can encode for more than one amino acid depending on the availability of the amino acids and the needed micronutrients. Those nutrients can cause disease but also prevent diseases, even cancer and auto immunity. In fact we hypothesize that auto immunity is an effective process of the organism to clear suboptimal proteins, formed due to amino acid and micronutrient deficiencies. Only when deficiencies sustain, disease will develop, otherwise the autoantibodies will function as all antibodies function, in a protective way. Furthermore, we hypothesize that essential amino acids are less important than nonessential amino acid (NEA). Species developed the ability to produce the nonessential amino acids themselves because they were not provided by food sufficiently. In contrast essential amino acids are widely available, without any evolutionary pressure. Since we can only produce small amounts of NEA and the availability in food can be reasoned to be too low they are still our main concern in amino acid availability. In conclusion, we hypothesize that increasing health will only be possible by improving our natural environment and living circumstances, not by changing the genes, since they are our last line of defense in surviving our environmental changes. PMID:22289341

  16. Complete Genome Sequence of Amino Acid-Utilizing Eubacterium acidaminophilum al-2 (DSM 3953)

    PubMed Central

    Poehlein, Anja; Andreesen, Jan R.

    2014-01-01

    Eubacterium acidaminophilum is a strictly anaerobic, Gram-positive, rod-shaped bacterium which belongs to cluster XI of the Clostridia. It ferments amino acids by a Stickland reaction. The genome harbors a chromosome (2.25 Mb) and a megaplasmid (0.8 Mb). It contains several gene clusters coding for selenocysteine-containing, glycine-derived, and amino acid-degrading reductases. PMID:24926057

  17. Resequencing of the common marmoset genome improves genome assemblies and gene-coding sequence analysis

    PubMed Central

    Sato, Kengo; Kuroki, Yoko; Kumita, Wakako; Fujiyama, Asao; Toyoda, Atsushi; Kawai, Jun; Iriki, Atsushi; Sasaki, Erika; Okano, Hideyuki; Sakakibara, Yasubumi

    2015-01-01

    The first draft of the common marmoset (Callithrix jacchus) genome was published by the Marmoset Genome Sequencing and Analysis Consortium. The draft was based on whole-genome shotgun sequencing, and the current assembly version is Callithrix_jacches-3.2.1, but there still exist 187,214 undetermined gap regions and supercontigs and relatively short contigs that are unmapped to chromosomes in the draft genome. We performed resequencing and assembly of the genome of common marmoset by deep sequencing with high-throughput sequencing technology. Several different sequence runs using Illumina sequencing platforms were executed, and 181 Gbp of high-quality bases including mate-pairs with long insert lengths of 3, 8, 20, and 40 Kbp were obtained, that is, approximately 60× coverage. The resequencing significantly improved the MGSAC draft genome sequence. The N50 of the contigs, which is a statistical measure used to evaluate assembly quality, doubled. As a result, 51% of the contigs (total length: 299 Mbp) that were unmapped to chromosomes in the MGSAC draft were merged with chromosomal contigs, and the improved genome sequence helped to detect 5,288 new genes that are homologous to human cDNAs and the gaps in 5,187 transcripts of the Ensembl gene annotations were completely filled. PMID:26586576

  18. The substrate specificity-determining amino acid code of 4-coumarate:CoA ligase.

    PubMed

    Schneider, Katja; Hövel, Klaus; Witzel, Kilian; Hamberger, Björn; Schomburg, Dietmar; Kombrink, Erich; Stuible, Hans-Peter

    2003-07-01

    To reveal the structural principles determining substrate specificity of 4-coumarate:CoA ligase (4CL), the crystal structure of the phenylalanine activation domain of gramicidin S synthetase was used as a template for homology modeling. According to our model, 12 amino acid residues lining the Arabidopsis 4CL isoform 2 (At4CL2) substrate binding pocket (SBP) function as a signature motif generally determining 4CL substrate specificity. We used this substrate specificity code to create At4CL2 gain-of-function mutants. By increasing the space within the SBP we generated ferulic- and sinapic acid-activating At4CL2 variants. Increasing the hydrophobicity of the SBP resulted in At4CL2 variants with strongly enhanced conversion of cinnamic acid. These enzyme variants are suitable tools for investigating and influencing metabolic channeling mediated by 4CL. Knowledge of the 4CL specificity code will facilitate the prediction of substrate preference of numerous, still uncharacterized 4CL-like proteins. PMID:12819348

  19. Revealing the amino acid composition of proteins within an expanded genetic code

    PubMed Central

    Aerni, Hans R.; Shifman, Mark A.; Rogulina, Svetlana; O'Donoghue, Patrick; Rinehart, Jesse

    2015-01-01

    The genetic code can be manipulated to reassign codons for the incorporation of non-standard amino acids (NSAA). Deletion of release factor 1 in Escherichia coli enhances translation of UAG (Stop) codons, yet may also extended protein synthesis at natural UAG terminated messenger RNAs. The fidelity of protein synthesis at reassigned UAG codons and the purity of the NSAA containing proteins produced require careful examination. Proteomics would be an ideal tool for these tasks, but conventional proteomic analyses cannot readily identify the extended proteins and accurately discover multiple amino acid (AA) insertions at a single UAG. To address these challenges, we created a new proteomic workflow that enabled the detection of UAG readthrough in native proteins in E. coli strains in which UAG was reassigned to encode phosphoserine. The method also enabled quantitation of NSAA and natural AA incorporation at UAG in a recombinant reporter protein. As a proof-of-principle, we measured the fidelity and purity of the phosphoserine orthogonal translation system (OTS) and used this information to improve its performance. Our results show a surprising diversity of natural AAs at reassigned stop codons. Our method can be used to improve OTSs and to quantify amino acid purity at reassigned codons in organisms with expanded genetic codes. PMID:25378305

  20. Revealing the amino acid composition of proteins within an expanded genetic code.

    PubMed

    Aerni, Hans R; Shifman, Mark A; Rogulina, Svetlana; O'Donoghue, Patrick; Rinehart, Jesse

    2015-01-01

    The genetic code can be manipulated to reassign codons for the incorporation of non-standard amino acids (NSAA). Deletion of release factor 1 in Escherichia coli enhances translation of UAG (Stop) codons, yet may also extended protein synthesis at natural UAG terminated messenger RNAs. The fidelity of protein synthesis at reassigned UAG codons and the purity of the NSAA containing proteins produced require careful examination. Proteomics would be an ideal tool for these tasks, but conventional proteomic analyses cannot readily identify the extended proteins and accurately discover multiple amino acid (AA) insertions at a single UAG. To address these challenges, we created a new proteomic workflow that enabled the detection of UAG readthrough in native proteins in E. coli strains in which UAG was reassigned to encode phosphoserine. The method also enabled quantitation of NSAA and natural AA incorporation at UAG in a recombinant reporter protein. As a proof-of-principle, we measured the fidelity and purity of the phosphoserine orthogonal translation system (OTS) and used this information to improve its performance. Our results show a surprising diversity of natural AAs at reassigned stop codons. Our method can be used to improve OTSs and to quantify amino acid purity at reassigned codons in organisms with expanded genetic codes. PMID:25378305

  1. Complete genome sequence of the actinobacterium Amycolatopsis japonica MG417-CF17(T) (=DSM 44213T) producing (S,S)-N,N'-ethylenediaminedisuccinic acid.

    PubMed

    Stegmann, Evi; Albersmeier, Andreas; Spohn, Marius; Gert, Helena; Weber, Tilmann; Wohlleben, Wolfgang; Kalinowski, Jörn; Rückert, Christian

    2014-11-10

    We report the complete genome sequence of Amycolatopsis japonica MG417-CF17(T) (=DSM 44213(T)) which was identified as the producer of (S,S)-N,N'-ethylenediaminedisuccinic acid during a screening for phospholipase C inhibitors. The genome of A. japonica MG417-CF17(T) consists of two replicons: the chromosome (8,961,318 bp, 68.89% G+C content) and the plasmid pAmyja1 (92,539 bp, 68.23% G+C content), encoding a total of 8422 protein coding genes. Analysis of the sequence data revealed 30 clusters encoding the biosynthesis of secondary metabolites. PMID:25193710

  2. Detection and isolation of nucleic acid sequences using competitive hybridization probes

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    1997-01-01

    A method for detecting a target nucleic acid sequence in a sample is provided using hybridization probes which competitively hybridize to a target nucleic acid. According to the method, a target nucleic acid sequence is hybridized to first and second hybridization probes which are complementary to overlapping portions of the target nucleic acid sequence, the first hybridization probe including a first complexing agent capable of forming a binding pair with a second complexing agent and the second hybridization probe including a detectable marker. The first complexing agent attached to the first hybridization probe is contacted with a second complexing agent, the second complexing agent being attached to a solid support such that when the first and second complexing agents are attached, target nucleic acid sequences hybridized to the first hybridization probe become immobilized on to the solid support. The immobilized target nucleic acids are then separated and detected by detecting the detectable marker attached to the second hybridization probe. A kit for performing the method is also provided.

  3. Detection and isolation of nucleic acid sequences using competitive hybridization probes

    DOEpatents

    Lucas, J.N.; Straume, T.; Bogen, K.T.

    1997-04-01

    A method for detecting a target nucleic acid sequence in a sample is provided using hybridization probes which competitively hybridize to a target nucleic acid. According to the method, a target nucleic acid sequence is hybridized to first and second hybridization probes which are complementary to overlapping portions of the target nucleic acid sequence, the first hybridization probe including a first complexing agent capable of forming a binding pair with a second complexing agent and the second hybridization probe including a detectable marker. The first complexing agent attached to the first hybridization probe is contacted with a second complexing agent, the second complexing agent being attached to a solid support such that when the first and second complexing agents are attached, target nucleic acid sequences hybridized to the first hybridization probe become immobilized on to the solid support. The immobilized target nucleic acids are then separated and detected by detecting the detectable marker attached to the second hybridization probe. A kit for performing the method is also provided. 7 figs.

  4. Functional Divergence of APETALA1 and FRUITFULL is due to Changes in both Regulation and Coding Sequence.

    PubMed

    McCarthy, Elizabeth W; Mohamed, Abeer; Litt, Amy

    2015-01-01

    Gene duplications are prevalent in plants, and functional divergence subsequent to duplication may be linked with the occurrence of novel phenotypes in plant evolution. Here, we examine the functional divergence of Arabidopsis thaliana APETALA1 (AP1) and FRUITFULL (FUL), which arose via a duplication correlated with the origin of the core eudicots. Both AP1 and FUL play a role in floral meristem identity, but AP1 is required for the formation of sepals and petals whereas FUL is involved in cauline leaf and fruit development. AP1 and FUL are expressed in mutually exclusive domains but also differ in sequence, with unique conserved motifs in the C-terminal domains of the proteins that suggest functional differentiation. To determine whether the functional divergence of AP1 and FUL is due to changes in regulation or changes in coding sequence, we performed promoter swap experiments, in which FUL was expressed in the AP1 domain in the ap1 mutant and vice versa. Our results show that FUL can partially substitute for AP1, and AP1 can partially substitute for FUL; thus, the functional divergence between AP1 and FUL is due to changes in both regulation and coding sequence. We also mutated AP1 and FUL conserved motifs to determine if they are required for protein function and tested the ability of these mutated proteins to interact in yeast with known partners. We found that these motifs appear to play at best a minor role in protein function and dimerization capability, despite being strongly conserved. Our results suggest that the functional differentiation of these two paralogous key transcriptional regulators involves both differences in regulation and in sequence; however, sequence changes in the form of unique conserved motifs do not explain the differences observed. PMID:26697035

  5. DNA sequence variation in a non-coding region of low recombination on the human X chromosome.

    PubMed

    Kaessmann, H; Heissig, F; von Haeseler, A; Pääbo, S

    1999-05-01

    DNA sequence variation has become a major source of insight regarding the origin and history of our species as well as an important tool for the identification of allelic variants associated with disease. Comparative sequencing of DNA has to date focused mainly on mitochondrial (mt) DNA, which due to its apparent lack of recombination and high evolutionary rate lends itself well to the study of human evolution. These advantages also entail limitations. For example, the high mutation rate of mtDNA results in multiple substitutions that make phylogenetic analysis difficult and, because mtDNA is maternally inherited, it reflects only the history of females. For the history of males, the non-recombining part of the paternally inherited Y chromosome can be studied. The extent of variation on the Y chromosome is so low that variation at particular sites known to be polymorphic rather than entire sequences are typically determined. It is currently unclear how some forms of analysis (such as the coalescent) should be applied to such data. Furthermore, the lack of recombination means that selection at any locus affects all 59 Mb of DNA. To gauge the extent and pattern of point substitutional variation in non-coding parts of the human genome, we have sequenced 10 kb of non-coding DNA in a region of low recombination at Xq13.3. Analysis of this sequence in 69 individuals representing all major linguistic groups reveals the highest overall diversity in Africa, whereas deep divergences also exist in Asia. The time elapsed since the most recent common ancestor (MRCA) is 535,000+/-119,000 years. We expect this type of nuclear locus to provide more answers about the genetic origin and history of humans. PMID:10319866

  6. Functional Divergence of APETALA1 and FRUITFULL is due to Changes in both Regulation and Coding Sequence

    PubMed Central

    McCarthy, Elizabeth W.; Mohamed, Abeer; Litt, Amy

    2015-01-01

    Gene duplications are prevalent in plants, and functional divergence subsequent to duplication may be linked with the occurrence of novel phenotypes in plant evolution. Here, we examine the functional divergence of Arabidopsis thaliana APETALA1 (AP1) and FRUITFULL (FUL), which arose via a duplication correlated with the origin of the core eudicots. Both AP1 and FUL play a role in floral meristem identity, but AP1 is required for the formation of sepals and petals whereas FUL is involved in cauline leaf and fruit development. AP1 and FUL are expressed in mutually exclusive domains but also differ in sequence, with unique conserved motifs in the C-terminal domains of the proteins that suggest functional differentiation. To determine whether the functional divergence of AP1 and FUL is due to changes in regulation or changes in coding sequence, we performed promoter swap experiments, in which FUL was expressed in the AP1 domain in the ap1 mutant and vice versa. Our results show that FUL can partially substitute for AP1, and AP1 can partially substitute for FUL; thus, the functional divergence between AP1 and FUL is due to changes in both regulation and coding sequence. We also mutated AP1 and FUL conserved motifs to determine if they are required for protein function and tested the ability of these mutated proteins to interact in yeast with known partners. We found that these motifs appear to play at best a minor role in protein function and dimerization capability, despite being strongly conserved. Our results suggest that the functional differentiation of these two paralogous key transcriptional regulators involves both differences in regulation and in sequence; however, sequence changes in the form of unique conserved motifs do not explain the differences observed. PMID:26697035

  7. Beta.-glucosidase coding sequences and protein from orpinomyces PC-2

    DOEpatents

    Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong; Ximenes, Eduardo A.

    2001-02-06

    Provided is a novel .beta.-glucosidase from Orpinomyces sp. PC2, nucleotide sequences encoding the mature protein and the precursor protein, and methods for recombinant production of this .beta.-glucosidase.

  8. The code for directing proteins for translocation across ER membrane: SRP cotranslationally recognizes specific features of a signal sequence.

    PubMed

    Nilsson, IngMarie; Lara, Patricia; Hessa, Tara; Johnson, Arthur E; von Heijne, Gunnar; Karamyshev, Andrey L

    2015-03-27

    The signal recognition particle (SRP) cotranslationally recognizes signal sequences of secretory proteins and targets ribosome-nascent chain complexes to the SRP receptor in the endoplasmic reticulum membrane, initiating translocation of the nascent chain through the Sec61 translocon. Although signal sequences do not have homology, they have similar structural regions: a positively charged N-terminus, a hydrophobic core and a more polar C-terminal region that contains the cleavage site for the signal peptidase. Here, we have used site-specific photocrosslinking to study SRP-signal sequence interactions. A photoreactive probe was incorporated into the middle of wild-type or mutated signal sequences of the secretory protein preprolactin by in vitro translation of mRNAs containing an amber-stop codon in the signal peptide in the presence of the N(ε)-(5-azido-2 nitrobenzoyl)-Lys-tRNA(amb) amber suppressor. A homogeneous population of SRP-ribosome-nascent chain complexes was obtained by the use of truncated mRNAs in translations performed in the presence of purified canine SRP. Quantitative analysis of the photoadducts revealed that charged residues at the N-terminus of the signal sequence or in the early part of the mature protein have only a mild effect on the SRP-signal sequence association. However, deletions of amino acid residues in the hydrophobic portion of the signal sequence severely affect SRP binding. The photocrosslinking data correlate with targeting efficiency and translocation across the membrane. Thus, the hydrophobic core of the signal sequence is primarily responsible for its recognition and binding by SRP, while positive charges fine-tune the SRP-signal sequence affinity and targeting to the translocon. PMID:24979680

  9. A 40-kilodalton cell wall protein-coding sequence upstream of the sr gene of Streptococcus mutans OMZ175 (serotype f).

    PubMed Central

    Ogier, J A; Schöller, M; Lepoivre, Y; Gangloff, S; M'Zoughi, R; Klein, J P

    1991-01-01

    Streptococcus mutans surface proteins may be important in immunization against dental caries. We report the existence of an open reading frame of 1,005 bp that lies 1,162 bases upstream of the S. mutans OMZ175 sr gene and that encodes a cell wall-associated protein. This open reading frame codes for 335 amino acid residues. The first 18-amino acid region is predominantly hydrophobic and resembles a signal peptide, and the hydrophobic C-terminal region may function as an anchor to the bacterial cell wall. On the basis of the predicted antigenic determinants of the deduced amino acid sequence, a 16-residue synthetic peptide corresponding to the middle hydrophilic coiled region was synthesized. Antibodies raised against this synthetic peptide reacted with a protein with an apparent Mr of 40,000 that was identified by Western immunoblotting in a cell wall extract from S. mutans OMZ175. The high reactivity in an enzyme-linked immunosorbent assay of the antibodies with whole S. mutans OMZ175 cells showed that this protein was located on the bacterial cell surface. Furthermore, the antipeptide immunoglobulin G recognized an identical determinant on the cell surface of other members of the S. mutans group. However, the function of this protein is not yet known. Images PMID:2019433

  10. The complete coding region sequence of river buffalo (Bubalus bubalis) SRY gene.

    PubMed

    Parma, Pietro; Feligini, Maria; Greppi, Gianfranco; Enne, Giuseppe

    2004-02-01

    The Y-linked SRY gene is responsible for testis determination in mammals. Mutations in this gene can lead to XY Gonadal Dysgenesis, an abnormal sexual phenotype described in humans, cattle, horses and river buffalo. We report here the complete river buffalo SRY sequence in order to enable the genetic diagnosis of this disease. The SRY sequence was also used to confirm the evolutionary divergence time between cattle and river buffalo 10 million years ago. PMID:15354359

  11. Protein location prediction using atomic composition and global features of the amino acid sequence

    SciTech Connect

    Cherian, Betsy Sheena; Nair, Achuthsankar S.

    2010-01-22

    Subcellular location of protein is constructive information in determining its function, screening for drug candidates, vaccine design, annotation of gene products and in selecting relevant proteins for further studies. Computational prediction of subcellular localization deals with predicting the location of a protein from its amino acid sequence. For a computational localization prediction method to be more accurate, it should exploit all possible relevant biological features that contribute to the subcellular localization. In this work, we extracted the biological features from the full length protein sequence to incorporate more biological information. A new biological feature, distribution of atomic composition is effectively used with, multiple physiochemical properties, amino acid composition, three part amino acid composition, and sequence similarity for predicting the subcellular location of the protein. Support Vector Machines are designed for four modules and prediction is made by a weighted voting system. Our system makes prediction with an accuracy of 100, 82.47, 88.81 for self-consistency test, jackknife test and independent data test respectively. Our results provide evidence that the prediction based on the biological features derived from the full length amino acid sequence gives better accuracy than those derived from N-terminal alone. Considering the features as a distribution within the entire sequence will bring out underlying property distribution to a greater detail to enhance the prediction accuracy.

  12. Sequence Variability in Viral Genome Non-coding Regions Likely Contribute to Observed Differences in Viral Replication Amongst MARV Strains

    PubMed Central

    ALONSO, JESUS A.; PATTERSON, JEAN L.

    2013-01-01

    The Marburg viruses Musoke (MARV-Mus) and Angola (MARV-Ang) have highly similar genomic sequences. Analysis of viral replication using various assays consistently identified MARV-Ang as the faster replicating virus. Non-coding genomic regions of negative sense RNA viruses are known to play a role in viral gene expression. A comparison of the six non-coding regions using bicistronic minigenomes revealed that the first two non-coding regions (NP / VP35 and VP35 / VP40) differed significantly in their transcriptional regulation. Deletion mutation analysis of the MARV-Mus NP / VP35 region further revealed that the MARV polymerase (L) is able to initiate production of the downstream gene without the presence of highly conserved regulatory signals. Bicistronic minigenome assays also identified the VP30 mRNA 5′ untranslated region as an rZAP-targeted RNA motif. Overall, our studies indicate that the high variation of MARV non-coding regions may play a significant role in observed differences in transcription and/or replication. PMID:23510675

  13. Next-Generation Sequencing of Protein-Coding and Long Non-protein-Coding RNAs in Two Types of Exosomes Derived from Human Whole Saliva.

    PubMed

    Ogawa, Yuko; Tsujimoto, Masafumi; Yanoshita, Ryohei

    2016-01-01

    Exosomes are small extracellular vesicles containing microRNAs and mRNAs that are produced by various types of cells. We previously used ultrafiltration and size-exclusion chromatography to isolate two types of human salivary exosomes (exosomes I, II) that are different in size and proteomes. We showed that salivary exosomes contain large repertoires of small RNAs. However, precise information regarding long RNAs in salivary exosomes has not been fully determined. In this study, we investigated the compositions of protein-coding RNAs (pcRNAs) and long non-protein-coding RNAs (lncRNAs) of exosome I, exosome II and whole saliva (WS) by next-generation sequencing technology. Although 11% of all RNAs were commonly detected among the three samples, the compositions of reads mapping to known RNAs were similar. The most abundant pcRNA is ribosomal RNA protein, and pcRNAs of some salivary proteins such as S100 calcium-binding protein A8 (protein S100-A8) were present in salivary exosomes. Interestingly, lncRNAs of pseudogenes (presumably, processed pseudogenes) were abundant in exosome I, exosome II and WS. Translationally controlled tumor protein gene, which plays an important role in cell proliferation, cell death and immune responses, was highly expressed as pcRNA and pseudogenes in salivary exosomes. Our results show that salivary exosomes contain various types of RNAs such as pseudogenes and small RNAs, and may mediate intercellular communication by transferring these RNAs to target cells as gene expression regulators. PMID:27582331

  14. Ab initio detection of fuzzy amino acid tandem repeats in protein sequences

    PubMed Central

    2012-01-01

    Background Tandem repetitions within protein amino acid sequences often correspond to regular secondary structures and form multi-repeat 3D assemblies of varied size and function. Developing internal repetitions is one of the evolutionary mechanisms that proteins employ to adapt their structure and function under evolutionary pressure. While there is keen interest in understanding such phenomena, detection of repeating structures based only on sequence analysis is considered an arduous task, since structure and function is often preserved even under considerable sequence divergence (fuzzy tandem repeats). Results In this paper we present PTRStalker, a new algorithm for ab-initio detection of fuzzy tandem repeats in protein amino acid sequences. In the reported results we show that by feeding PTRStalker with amino acid sequences from the UniProtKB/Swiss-Prot database we detect novel tandemly repeated structures not captured by other state-of-the-art tools. Experiments with membrane proteins indicate that PTRStalker can detect global symmetries in the primary structure which are then reflected in the tertiary structure. Conclusions PTRStalker is able to detect fuzzy tandem repeating structures in protein sequences, with performance beyond the current state-of-the art. Such a tool may be a valuable support to investigating protein structural properties when tertiary X-ray data is not available. PMID:22536906

  15. Multimodal phylogeny for taxonomy: integrating information from nucleotide and amino acid sequences.

    PubMed

    Bicego, Manuele; Dellaglio, Franco; Felis, Giovanna E

    2007-10-01

    The crucial role played by the analysis of microbial diversity in biotechnology-based innovations has increased the interest in the microbial taxonomy research area. Phylogenetic sequence analyses have contributed significantly to the advances in this field, also in the view of the large amount of sequence data collected in recent years. Phylogenetic analyses could be realized on the basis of protein-encoding nucleotide sequences or encoded amino acid molecules: these two mechanisms present different peculiarities, still starting from two alternative representations of the same information. This complementarity could be exploited to achieve a multimodal phylogenetic scheme that is able to integrate gene and protein information in order to realize a single final tree. This aspect has been poorly addressed in the literature. In this paper, we propose to integrate the two phylogenetic analyses using basic schemes derived from the multimodality fusion theory (or multiclassifier systems theory), a well-founded and rigorous branch for which its powerfulness has already been demonstrated in other pattern recognition contexts. The proposed approach could be applied to distance matrix-based phylogenetic techniques (like neighbor joining), resulting in a smart and fast method. The proposed methodology has been tested in a real case involving sequences of some species of lactic acid bacteria. With this dataset, both nucleotide sequence- and amino acid sequence-based phylogenetic analyses present some drawbacks, which are overcome with the multimodal analysis. PMID:17933011

  16. The amino-acid sequence of leghemoglobin component a from Phaseolus vulgaris (kidney bean).

    PubMed

    Lehtovaara, P; Ellfolk, N

    1975-06-01

    1. Leghemoglobin component a from Phaseolus vulgaris (kidney bean) was digested with trypsin; 15 tryptic peptides and free lysine were purified and the amino acid sequences of the peptides determined. 2. The internal order of the tryptic peptides was determined by the bridge peptides obtained from the thermolytic digest and the dilute acid hydrolyzate of kidney bean leghemoglobin a; 12 thermolytic peptides and two acid hydrolysis peptides were purified and the sequences were partially or completely determined. 3. The complete amino acid sequence of kidney bean leghemoglobin a is compared to that of leghemoglobin a from soybean (Glycine max) and to some animal globins. As regards sequence, the kidney bean globin has 79% identity with the soybean globin and 21% identity with human hemoglobin gamma-chain. Seven of the 14 amino acid residues common to most globins are found in the kidney bean globin. Trp-15 and Tyr-145 are evolutionarily conserved in this globin, which confirms the concept of a common origin of animal and plant globins. PMID:809270

  17. ADAR2 affects mRNA coding sequence edits with only modest effects on gene expression or splicing in vivo.

    PubMed

    Dillman, Allissa A; Cookson, Mark R; Galter, Dagmar

    2016-01-01

    Adenosine deaminases bind double stranded RNA and convert adenosine to inosine. Editing creates multiple isoforms of neurotransmitter receptors, such as with Gria2. Adar2 KO mice die of seizures shortly after birth, but if the Gria2 Q/R editing site is mutated to mimic the edited version then the animals are viable. We performed RNA-Seq on frontal cortices of Adar2(-/-) Gria2(R/R) mice and littermates. We found 56 editing sites with significantly diminished editing levels in Adar2 deficient animals with the majority in coding regions. Only two genes and 3 exons showed statistically significant differences in expression levels. This work illustrates that ADAR2 is important in site-specific changes of protein coding sequences but has relatively modest effects on gene expression and splicing in the adult mouse frontal cortex. PMID:26669816

  18. Construction and Analysis of a Novel 2-D Optical Orthogonal Codes Based on Modified One-coincidence Sequence

    NASA Astrophysics Data System (ADS)

    Ji, Jianhua; Wang, Yanfen; Wang, Ke; Xu, Ming; Zhang, Zhipeng; Yang, Shuwen

    2013-09-01

    A new two-dimensional OOC (optical orthogonal codes) named PC/MOCS is constructed, using PC (prime code) for time spreading and MOCS (modified one-coincidence sequence) for wavelength hopping. Compared with PC/PC, the number of wavelengths for PC/MOCS is not limited to a prime number. Compared with PC/OCS, the length of MOCS need not be expanded to the same length of PC. PC/MOCS can be constructed flexibly, and also can use available wavelengths effectively. Theoretical analysis shows that PC/MOCS can reduce the bit error rate (BER) of OCDMA system, and can support more users than PC/PC and PC/OCS.

  19. The bioinformatics of nucleotide sequence coding for proteins requiring metal coenzymes and proteins embedded with metals

    NASA Astrophysics Data System (ADS)

    Tremberger, G.; Dehipawala, Sunil; Cheung, E.; Holden, T.; Sullivan, R.; Nguyen, A.; Lieberman, D.; Cheung, T.

    2015-09-01

    All metallo-proteins need post-translation metal incorporation. In fact, the isotope ratio of Fe, Cu, and Zn in physiology and oncology have emerged as an important tool. The nickel containing F430 is the prosthetic group of the enzyme methyl coenzyme M reductase which catalyzes the release of methane in the final step of methano-genesis, a prime energy metabolism candidate for life exploration space mission in the solar system. The 3.5 Gyr early life sulfite reductase as a life switch energy metabolism had Fe-Mo clusters. The nitrogenase for nitrogen fixation 3 billion years ago had Mo. The early life arsenite oxidase needed for anoxygenic photosynthesis energy metabolism 2.8 billion years ago had Mo and Fe. The selection pressure in metal incorporation inside a protein would be quantifiable in terms of the related nucleotide sequence complexity with fractal dimension and entropy values. Simulation model showed that the studied metal-required energy metabolism sequences had at least ten times more selection pressure relatively in comparison to the horizontal transferred sequences in Mealybug, guided by the outcome histogram of the correlation R-sq values. The metal energy metabolism sequence group was compared to the circadian clock KaiC sequence group using magnesium atomic level bond shifting mechanism in the protein, and the simulation model would suggest a much higher selection pressure for the energy life switch sequence group. The possibility of using Kepler 444 as an example of ancient life in Galaxy with the associated exoplanets has been proposed and is further discussed in this report. Examples of arsenic metal bonding shift probed by Synchrotron-based X-ray spectroscopy data and Zn controlled FOXP2 regulated pathways in human and chimp brain studied tissue samples are studied in relationship to the sequence bioinformatics. The analysis results suggest that relatively large metal bonding shift amount is associated with low probability correlation R

  20. Genome sequence of the acid-tolerant Burkholderia sp. strain WSM2232 from Karijini National Park, Australia

    PubMed Central

    Walker, Robert; Watkin, Elizabeth; Tian, Rui; Bräu, Lambert; O’Hara, Graham; Goodwin, Lynne; Han, James; Reddy, Tatiparthi; Huntemann, Marcel; Pati, Amrita; Woyke, Tanja; Mavromatis, Konstantinos; Markowitz, Victor; Ivanova, Natalia; Kyrpides, Nikos; Reeve, Wayne

    2013-01-01

    Burkholderia sp. strain WSM2232 is an aerobic, motile, Gram-negative, non-spore-forming acid-tolerant rod that was trapped in 2001 from acidic soil collected from Karijini National Park (Australia) using Gastrolobium capitatum as a host. WSM2232 was effective in nitrogen fixation with G. capitatum but subsequently lost symbiotic competence during long-term storage. Here we describe the features of Burkholderia sp. strain WSM2232, together with genome sequence information and its annotation. The 7,208,311 bp standard-draft genome is arranged into 72 scaffolds of 72 contigs containing 6,322 protein-coding genes and 61 RNA-only encoding genes. The loss of symbiotic capability can now be attributed to the loss of nodulation and nitrogen fixation genes from the genome. This rhizobial genome is one of 100 sequenced as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Root Nodule Bacteria (GEBA-RNB) project. PMID:25197442

  1. Coding Local and Global Binary Visual Features Extracted From Video Sequences.

    PubMed

    Baroffio, Luca; Canclini, Antonio; Cesana, Matteo; Redondi, Alessandro; Tagliasacchi, Marco; Tubaro, Stefano

    2015-11-01

    Binary local features represent an effective alternative to real-valued descriptors, leading to comparable results for many visual analysis tasks while being characterized by significantly lower computational complexity and memory requirements. When dealing with large collections, a more compact representation based on global features is often preferred, which can be obtained from local features by means of, e.g., the bag-of-visual word model. Several applications, including, for example, visual sensor networks and mobile augmented reality, require visual features to be transmitted over a bandwidth-limited network, thus calling for coding techniques that aim at reducing the required bit budget while attaining a target level of efficiency. In this paper, we investigate a coding scheme tailored to both local and global binary features, which aims at exploiting both spatial and temporal redundancy by means of intra- and inter-frame coding. In this respect, the proposed coding scheme can conveniently be adopted to support the analyze-then-compress (ATC) paradigm. That is, visual features are extracted from the acquired content, encoded at remote nodes, and finally transmitted to a central controller that performs the visual analysis. This is in contrast with the traditional approach, in which visual content is acquired at a node, compressed and then sent to a central unit for further processing, according to the compress-then-analyze (CTA) paradigm. In this paper, we experimentally compare the ATC and the CTA by means of rate-efficiency curves in the context of two different visual analysis tasks: 1) homography estimation and 2) content-based retrieval. Our results show that the novel ATC paradigm based on the proposed coding primitives can be competitive with the CTA, especially in bandwidth limited scenarios. PMID:26080384

  2. Using Triple Helix Forming Peptide Nucleic Acids for Sequence-selective Recognition of Double-stranded RNA

    PubMed Central

    Hnedzko, Dziyana; Cheruiyot, Samwel K.; Rozners, Eriks

    2014-01-01

    Non-coding RNAs play important roles in regulation of gene expression. Specific recognition and inhibition of these biologically important RNAs that form complex double-helical structures will be highly useful for fundamental studies in biology and practical applications in medicine. This protocol describes a strategy developed in our laboratory for sequence-selective recognition of double-stranded RNA (dsRNA) using triple helix forming peptide nucleic acids (PNAs) that bind in the major grove of RNA helix. The strategy developed uses chemically modified nucleobases, such as 2-aminopyridine (M) that enables strong triple helical binding at physiologically relevant conditions, and 2-pyrimidinone (P) and 3-oxo-2,3-dihydropyridazine (E) that enable recognition of isolated pyrimidines in the purine rich strand of the RNA duplex. Detailed protocols for preparation of modified PNA monomers, solid-phase synthesis and HPLC purification of PNA oligomers, and measuring dsRNA binding affinity using isothermal titration calorimetry are included. PMID:25199637

  3. Short Time-Scale Sensory Coding in S1 during Discrimination of Whisker Vibrotactile Sequences.

    PubMed

    McGuire, Leah M; Telian, Gregory; Laboy-Juárez, Keven J; Miyashita, Toshio; Lee, Daniel J; Smith, Katherine A; Feldman, Daniel E

    2016-08-01

    Rodent whisker input consists of dense microvibration sequences that are often temporally integrated for perceptual discrimination. Whether primary somatosensory cortex (S1) participates in temporal integration is unknown. We trained rats to discriminate whisker impulse sequences that varied in single-impulse kinematics (5-20-ms time scale) and mean speed (150-ms time scale). Rats appeared to use the integrated feature, mean speed, to guide discrimination in this task, consistent with similar prior studies. Despite this, 52% of S1 units, including 73% of units in L4 and L2/3, encoded sequences at fast time scales (≤20 ms, mostly 5-10 ms), accurately reflecting single impulse kinematics. 17% of units, mostly in L5, showed weaker impulse responses and a slow firing rate increase during sequences. However, these units did not effectively integrate whisker impulses, but instead combined weak impulse responses with a distinct, slow signal correlated to behavioral choice. A neural decoder could identify sequences from fast unit spike trains and behavioral choice from slow units. Thus, S1 encoded fast time scale whisker input without substantial temporal integration across whisker impulses. PMID:27574970

  4. Phylum-Level Conservation of Regulatory Information in Nematodes despite Extensive Non-coding Sequence Divergence

    PubMed Central

    Gordon, Kacy L.; Arthur, Robert K.; Ruvinsky, Ilya

    2015-01-01

    Gene regulatory information guides development and shapes the course of evolution. To test conservation of gene regulation within the phylum Nematoda, we compared the functions of putative cis-regulatory sequences of four sets of orthologs (unc-47, unc-25, mec-3 and elt-2) from distantly-related nematode species. These species, Caenorhabditis elegans, its congeneric C. briggsae, and three parasitic species Meloidogyne hapla, Brugia malayi, and Trichinella spiralis, represent four of the five major clades in the phylum Nematoda. Despite the great phylogenetic distances sampled and the extensive sequence divergence of nematode genomes, all but one of the regulatory elements we tested are able to drive at least a subset of the expected gene expression patterns. We show that functionally conserved cis-regulatory elements have no more extended sequence similarity to their C. elegans orthologs than would be expected by chance, but they do harbor motifs that are important for proper expression of the C. elegans genes. These motifs are too short to be distinguished from the background level of sequence similarity, and while identical in sequence they are not conserved in orientation or position. Functional tests reveal that some of these motifs contribute to proper expression. Our results suggest that conserved regulatory circuitry can persist despite considerable turnover within cis elements. PMID:26020930

  5. Short Time-Scale Sensory Coding in S1 during Discrimination of Whisker Vibrotactile Sequences

    PubMed Central

    Miyashita, Toshio; Lee, Daniel J.; Smith, Katherine A.; Feldman, Daniel E.

    2016-01-01

    Rodent whisker input consists of dense microvibration sequences that are often temporally integrated for perceptual discrimination. Whether primary somatosensory cortex (S1) participates in temporal integration is unknown. We trained rats to discriminate whisker impulse sequences that varied in single-impulse kinematics (5–20-ms time scale) and mean speed (150-ms time scale). Rats appeared to use the integrated feature, mean speed, to guide discrimination in this task, consistent with similar prior studies. Despite this, 52% of S1 units, including 73% of units in L4 and L2/3, encoded sequences at fast time scales (≤20 ms, mostly 5–10 ms), accurately reflecting single impulse kinematics. 17% of units, mostly in L5, showed weaker impulse responses and a slow firing rate increase during sequences. However, these units did not effectively integrate whisker impulses, but instead combined weak impulse responses with a distinct, slow signal correlated to behavioral choice. A neural decoder could identify sequences from fast unit spike trains and behavioral choice from slow units. Thus, S1 encoded fast time scale whisker input without substantial temporal integration across whisker impulses. PMID:27574970

  6. A classification of glycosyl hydrolases based on amino acid sequence similarities.

    PubMed Central

    Henrissat, B

    1991-01-01

    The amino acid sequences of 301 glycosyl hydrolases and related enzymes have been compared. A total of 291 sequences corresponding to 39 EC entries could be classified into 35 families. Only ten sequences (less than 5% of the sample) could not be assigned to any family. With the sequences available for this analysis, 18 families were found to be monospecific (containing only one EC number) and 17 were found to be polyspecific (containing at least two EC numbers). Implications on the folding characteristics and mechanism of action of these enzymes and on the evolution of carbohydrate metabolism are discussed. With the steady increase in sequence and structural data, it is suggested that the enzyme classification system should perhaps be revised. PMID:1747104

  7. New families in the classification of glycosyl hydrolases based on amino acid sequence similarities.

    PubMed Central

    Henrissat, B; Bairoch, A

    1993-01-01

    301 glycosyl hydrolases and related enzymes corresponding to 39 EC entries of the I.U.B. classification system have been classified into 35 families on the basis of amino-acid-sequence similarities [Henrissat (1991) Biochem. J. 280, 309-316]. Approximately half of the families were found to be monospecific (containing only one EC number), whereas the other half were found to be polyspecific (containing at least two EC numbers). A > 60% increase in sequence data for glycosyl hydrolases (181 additional enzymes or enzyme domains sequences have since become available) allowed us to update the classification not only by the addition of more members to already identified families, but also by the finding of ten new families. On the basis of a comparison of 482 sequences corresponding to 52 EC entries, 45 families, out of which 22 are polyspecific, can now be defined. This classification has been implemented in the SWISS-PROT protein sequence data bank. PMID:8352747

  8. Sequence-specific purification of nucleic acids by PNA-controlled hybrid selection.

    PubMed

    Orum, H; Nielsen, P E; Jørgensen, M; Larsson, C; Stanley, C; Koch, T

    1995-09-01

    Using an oligohistidine peptide nucleic acids (oligohistidine-PNA) chimera, we have developed a rapid hybrid selection method that allows efficient, sequence-specific purification of a target nucleic acid. The method exploits two fundamental features of PNA. First, that PNA binds with high affinity and specificity to its complementary nucleic acid. Second, that amino acids are easily attached to the PNA oligomer during synthesis. We show that a (His)6-PNA chimera exhibits strong binding to chelated Ni2+ ions without compromising its native PNA hybridization properties. We further show that these characteristics allow the (His)6-PNA/DNA complex to be purified by the well-established method of metal ion affinity chromatography using a Ni(2+)-NTA (nitrilotriactic acid) resin. Specificity and efficiency are the touchstones of any nucleic acid purification scheme. We show that the specificity of the (His)6-PNA selection approach is such that oligonucleotides differing by only a single nucleotide can be selectively purified. We also show that large RNAs (2224 nucleotides) can be captured with high efficiency by using multiple (His)6-PNA probes. PNA can hybridize to nucleic acids in low-salt concentrations that destabilize native nucleic acid structures. We demonstrate that this property of PNA can be utilized to purify an oligonucleotide in which the target sequence forms part of an intramolecular stem/loop structure. PMID:7495562

  9. [Cloning of full-length coding sequence of tree shrew CD4 and prediction of its molecular characteristics].

    PubMed

    Tian, Wei-Wei; Gao, Yue-Dong; Guo, Yan; Huang, Jing-Fei; Xiao, Chang; Li, Zuo-Sheng; Zhang, Hua-Tang

    2012-02-01

    The tree shrews, as an ideal animal model receiving extensive attentions to human disease research, demands essential research tools, in particular cellular markers and monoclonal antibodies for immunological studies. In this paper, a 1 365 bp of the full-length CD4 cDNA encoding sequence was cloned from total RNA in peripheral blood of tree shrews, the sequence completes two unknown fragment gaps of tree shrews predicted CD4 cDNA in the GenBank database, and its molecular characteristics were analyzed compared with other mammals by using biology software such as Clustal W2.0 and so forth. The results showed that the extracellular and intracellular domains of tree shrews CD4 amino acid sequence are conserved. The tree shrews CD4 amino acid sequence showed a close genetic relationship with Homo sapiens and Macaca mulatta. Most regions of the tree shrews CD4 molecule surface showed positive charges as humans. However, compared with CD4 extracellular domain D1 of human, CD4 D1 surface of tree shrews showed more negative charges, and more two N-glycosylation sites, which may affect antibody binding. This study provides a theoretical basis for the preparation and functional studies of CD4 monoclonal antibody. PMID:22345010

  10. Cracking the Code of Human Diseases Using Next-Generation Sequencing: Applications, Challenges, and Perspectives

    PubMed Central

    Precone, Vincenza; Del Monaco, Valentina; Esposito, Maria Valeria; De Palma, Fatima Domenica Elisa; Ruocco, Anna; D'Argenio, Valeria

    2015-01-01

    Next-generation sequencing (NGS) technologies have greatly impacted on every field of molecular research mainly because they reduce costs and increase throughput of DNA sequencing. These features, together with the technology's flexibility, have opened the way to a variety of applications including the study of the molecular basis of human diseases. Several analytical approaches have been developed to selectively enrich regions of interest from the whole genome in order to identify germinal and/or somatic sequence variants and to study DNA methylation. These approaches are now widely used in research, and they are already being used in routine molecular diagnostics. However, some issues are still controversial, namely, standardization of methods, data analysis and storage, and ethical aspects. Besides providing an overview of the NGS-based approaches most frequently used to study the molecular basis of human diseases at DNA level, we discuss the principal challenges and applications of NGS in the field of human genomics. PMID:26665001

  11. Cracking the Code of Human Diseases Using Next-Generation Sequencing: Applications, Challenges, and Perspectives.

    PubMed

    Precone, Vincenza; Del Monaco, Valentina; Esposito, Maria Valeria; De Palma, Fatima Domenica Elisa; Ruocco, Anna; Salvatore, Francesco; D'Argenio, Valeria

    2015-01-01

    Next-generation sequencing (NGS) technologies have greatly impacted on every field of molecular research mainly because they reduce costs and increase throughput of DNA sequencing. These features, together with the technology's flexibility, have opened the way to a variety of applications including the study of the molecular basis of human diseases. Several analytical approaches have been developed to selectively enrich regions of interest from the whole genome in order to identify germinal and/or somatic sequence variants and to study DNA methylation. These approaches are now widely used in research, and they are already being used in routine molecular diagnostics. However, some issues are still controversial, namely, standardization of methods, data analysis and storage, and ethical aspects. Besides providing an overview of the NGS-based approaches most frequently used to study the molecular basis of human diseases at DNA level, we discuss the principal challenges and applications of NGS in the field of human genomics. PMID:26665001

  12. Antibody-specific model of amino acid substitution for immunological inferences from alignments of antibody sequences.

    PubMed

    Mirsky, Alexander; Kazandjian, Linda; Anisimova, Maria

    2015-03-01

    Antibodies are glycoproteins produced by the immune system as a dynamically adaptive line of defense against invading pathogens. Very elegant and specific mutational mechanisms allow B lymphocytes to produce a large and diversified repertoire of antibodies, which is modified and enhanced throughout all adulthood. One of these mechanisms is somatic hypermutation, which stochastically mutates nucleotides in the antibody genes, forming new sequences with different properties and, eventually, higher affinity and selectivity to the pathogenic target. As somatic hypermutation involves fast mutation of antibody sequences, this process can be described using a Markov substitution model of molecular evolution. Here, using large sets of antibody sequences from mice and humans, we infer an empirical amino acid substitution model AB, which is specific to antibody sequences. Compared with existing general amino acid models, we show that the AB model provides significantly better description for the somatic evolution of mice and human antibody sequences, as demonstrated on large next generation sequencing (NGS) antibody data. General amino acid models are reflective of conservation at the protein level due to functional constraints, with most frequent amino acids exchanges taking place between residues with the same or similar physicochemical properties. In contrast, within the variable part of antibody sequences we observed an elevated frequency of exchanges between amino acids with distinct physicochemical properties. This is indicative of a sui generis mutational mechanism, specific to antibody somatic hypermutation. We illustrate this property of antibody sequences by a comparative analysis of the network modularity implied by the AB model and general amino acid substitution models. We recommend using the new model for computational studies of antibody sequence maturation, including inference of alignments and phylogenetic trees describing antibody somatic hypermutation in

  13. Molecular differentiation of Nosema apis and Nosema ceranae based on species-specific sequence differences in a protein coding gene.

    PubMed

    Gisder, Sebastian; Genersch, Elke

    2013-05-01

    Nosema apis and Nosema ceranae are two microsporidian pathogens of the European honey bee, Apis mellifera. There is evidence that N. ceranae is more virulent than N. apis subject to environmental factors like climate. This makes N. ceranae one of the suspects in the increasing colony losses recently observed in many regions of the world. Correct differentiation between N. apis and N. ceranae is important and best accomplished by molecular methods. So far only protocols based on species-specific sequence differences in the 16S rRNA gene are available. However, recent studies indicated that these methods may lead to confusing results due to polymorphisms in and recombination between the multi-copy 16S rRNA genes. To solve this problem and to provide a reliable molecular tool for the differentiation between the two bee pathogenic microsporidia we here present and evaluate a duplex-PCR protocol based on species-specific sequence differences in the highly conserved gene coding for the DNA-dependent RNA polymerase II largest subunit. A total of 102 honey bee samples were analyzed by the novel PCR protocol and the results were compared with the results of the originally published PCR-RFLP analysis and two recently published differentiation protocols, based on 16S rRNA sequence differences. Although the novel PCR protocol proved to be as reliable as the 16S rRNA gene based PCR-RFLP it was superior to simple 16S rRNA based PCR protocols which tended to overestimate the rate of N. ceranae infections. Therefore, we propose that species-specific sequence differences of highly conserved protein coding genes should become the preferred molecular tool for differentiation of Nosema spp. PMID:23352902

  14. Ribosome-mediated incorporation of a non-standard amino acid into a peptide through expansion of the genetic code.

    PubMed

    Bain, J D; Switzer, C; Chamberlin, A R; Benner, S A

    1992-04-01

    One serious limitation facing protein engineers is the availability of only 20 'proteinogenic' amino acids encoded by natural messenger RNA. The lack of structural diversity among these amino acids restricts the mechanistic and structural issues that can be addressed by site-directed mutagenesis. Here we describe a new technology for incorporating non-standard amino acids into polypeptides by ribosome-based translation. In this technology, the genetic code is expanded through the creation of a 65th codon-anticodon pair from unnatural nucleoside bases having non-standard hydrogen-bonding patterns. This new codon-anticodon pair efficiently supports translation in vitro to yield peptides containing a non-standard amino acid. The versatility of the ribosome as a synthetic tool offers new possibilities for protein engineering, and compares favourably with another recently described approach in which the genetic code is simply rearranged to recruit stop codons to play a coding role. PMID:1560827

  15. Systematic analysis of mRNA 5' coding sequence incompleteness in Danio rerio: an automated EST-based approach

    PubMed Central

    Frabetti, Flavia; Casadei, Raffaella; Lenzi, Luca; Canaider, Silvia; Vitale, Lorenza; Facchin, Federica; Carinci, Paolo; Zannotti, Maria; Strippoli, Pierluigi

    2007-01-01

    Background All standard methods for cDNA cloning are affected by a potential inability to effectively clone the 5' region of mRNA. The aim of this work was to estimate mRNA open reading frame (ORF) 5' region sequence completeness in the model organism Danio rerio (zebrafish). Results We implemented a novel automated approach (5'_ORF_Extender) that systematically compares available expressed sequence tags (ESTs) with all the zebrafish experimentally determined mRNA sequences, identifies additional sequence stretches at 5' region and scans for the presence of all conditions needed to define a new, extended putative ORF. Our software was able to identify 285 (3.3%) mRNAs with putatively incomplete ORFs at 5' region and, in three example cases selected (selt1a, unc119.2, nppa), the extended coding region at 5' end was cloned by reverse transcription-polymerase chain reaction (RT-PCR). Conclusion The implemented method, which could also be useful for the analysis of other genomes, allowed us to describe the relevance of the "5' end mRNA artifact" problem for genomic annotation and functional genomic experiment design in zebrafish. Open peer review This article was reviewed by Alexey V. Kochetov (nominated by Mikhail Gelfand), Shamil Sunyaev, and Gáspár Jékely. For the full reviews, please go to the Reviewers' Comments section. PMID:18042283

  16. cDNA sequence coding for the alpha'-chain of the third complement component in the African lungfish.

    PubMed

    Sato, A; Sültmann, H; Mayer, W E; Figueroa, F; Tichy, H; Klein, J

    1999-04-01

    cDNA clones coding for almost the entire C3 alpha-chain of the African lungfish (Protopterus aethiopicus), a representative of the Sarcopterygii (lobe-finned fishes), were sequenced and characterized. From the sequence it is deduced that the lungfish C3 molecule is probably a disulphide-bonded alpha:beta dimer similar to that of the C3 components of other jawed vertebrates. The deduced sequence contains conserved sites presumably recognized by proteolytic enzymes (e.g. factor I) involved in the activation and inactivation of the component. It also contains the conserved thioester region and the putative site for binding properdin. However, the site for the interaction with complement receptor 2 and factor H are poorly conserved. Either complement receptor 2 and factor H are not present in the lungfish or they bind to different residues at the same or a different site than mammalian complement receptor 2 and factor H. The C3 alpha-chain sequences faithfully reflect the phylogenetic relationships among vertebrate classes and can therefore be used to help to resolve the long-standing controversy concerning the origin of the tetrapods. PMID:10219761

  17. Analysis of mutations in the entire coding sequence of the factor VIII gene

    SciTech Connect

    Bidichadani, S.I.; Lanyon, W.G.; Connor, J.M.

    1994-09-01

    Hemophilia A is a common X-linked recessive disorder of bleeding caused by deleterious mutations in the gene for clotting factor VIII. The large size of the factor VIII gene, the high frequency of de novo mutations and its tissue-specific expression complicate the detection of mutations. We have used a combination of RT-PCR of ectopic factor VIII transcripts and genomic DNA-PCRs to amplify the entire essential sequence of the factor VIII gene. This is followed by chemical mismatch cleavage analysis and direct sequencing in order to facilitate a comprehensive search for mutations. We describe the characterization of nine potentially pathogenic mutations, six of which are novel. In each case, a correlation of the genotype with the observed phenotype is presented. In order to evaluate the pathogenicity of the five missense mutations detected, we have analyzed them for evolutionary sequence conservation and for their involvement of sequence motifs catalogued in the PROSITE database of protein sites and patterns.

  18. Molecular phylogenetic analysis in Hammondia-like organisms based on partial Hsp70 coding sequences

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The 70-kDa heat shock protein (Hsp70) sequences are considered one of the most conserved proteins in all domain of life from Archaea to eukaryotes. Hammondia heydorni, H. hammondi, Toxoplasma gondii, Neospora hughesi and N. caninum (Hammondia-like organisms) are closely related tissue cyst-forming c...

  19. ANATOMICAL MNEMONICS OF THE GENETIC CODE: A FUNCTIONAL ICOSAHEDRON AND THE VIGESIMAL SYSTEM OF THE MAYA TO REPRESENT THE TWENTY PROTEINOGENIC AMINO ACIDS

    PubMed Central

    CASTRO-CHAVEZ, FERNANDO

    2016-01-01

    In programming and bioinformatics, the graphical interface is vital to describe and to abbreviate aspects and concepts of the physical world. The Mayan Culture developed the vigesimal system, a numerical system based on their count of fingers and toes. My objective is to equate the Mayan system and their numerical representation to the twenty amino acids according to size, except for the number one, represented by a dot, that here is given to cysteine, which acts as glue among peptides as one of its properties; in such a way, two vertical dots will be easily used to represent its related selenocysteine. The Mayan numerical system included the zero, represented by the Maya with an empty shell that here is used to represent the stop codons. On the other hand, the Chinese had a binary numerical system, similar to the binary comparisons of the three properties of Nucleotides within the double helix: H-Bonds, C-Rings and Tautomerism, called the I Ching which here is applied to the natural groups of amino acids that result of the 64-codons compared in binary in their H-Bonds versus their C-Rings, used here to successfully represent the mature sequence of the glucagon amino acids. Additional anatomical tools for the mnemonics of the genetic code and of its amino acid groups are also presented, as well as a functional icosahedron to represent them. Concluding, tools are presented for the visual analysis of proteins and peptide sequencing in bioinformatics and education to teach the genetic code and its resulting amino acids, plus their numerical systems. PMID:27081676

  20. Amino acid sequence of a vitamin K-dependent Ca2+-binding peptide from bovine prothrombin.

    PubMed

    Howard, J B; Fausch, M D

    1975-08-10

    The amino acid sequence of a 31-residue peptide from bovine prothrombin has been determined. This peptide has been shown to contain the vitamin K-dependent modification required for Ca2+ binding (Nelsestuen, G. L., and Suttie, J. W. (1973) Proc. Natl. Acad. Sci. U. S. A. 70, 3366-3370) and the modified amino acid, gamma-carboxyglutamic acid (Nelsestuen, G. L., Zytkovicz, T., and Howard, J. B. (1974) J. Biol. Chem. 249, 6347-6350). The peptide was shown to correspond to residues 12 to 42 of prothrombin. PMID:807581

  1. Amino acid sequences around the cysteine residues of rabbit muscle triose phosphate isomerase

    PubMed Central

    Miller, Janet C.; Waley, S. G.

    1971-01-01

    1. The nature of the subunits in rabbit muscle triose phosphate isomerase has been investigated. 2. Amino acid analyses show that there are five cysteine residues and two methionine residues/subunit. 3. The amino acid sequences around the cysteine residues have been determined; these account for about 75 residues. 4. Cleavage at the methionine residues with cyanogen bromide gave three fragments. 5. These results show that the subunits correspond to polypeptide chains, containing about 230 amino acid residues. The chains in triose phosphate isomerase seem to be shorter than those of other glycolytic enzymes. PMID:5165707

  2. Draft Genome Sequence of the Butyric Acid Producer Clostridium tyrobutyricum Strain CIP I-776 (IFP923)

    PubMed Central

    Clément, Benjamin; Lopes Ferreira, Nicolas

    2016-01-01

    Here, we report the draft genome sequence of Clostridium tyrobutyricum CIP I-776 (IFP923), an efficient producer of butyric acid. The genome consists of a single chromosome of 3.19 Mb and provides useful data concerning the metabolic capacities of the strain. PMID:26941139

  3. Draft Genome Sequence of Perfluorooctane Acid-Degrading Bacterium Pseudomonas parafulva YAB-1

    PubMed Central

    Tang, Chongjian; Peng, Qingjing; Peng, Qingzhong

    2015-01-01

    Pseudomonas parafulva YAB-1, isolated from perfluorinated compound-contaminated soil, has the ability to degrade perfluorooctane acid (PFOA) compound. Here, we report the draft genome sequence and annotation of the PFOA-degrading bacterium P. parafulva YAB-1. The data provide the basis to investigate the molecular mechanism of PFOA metabolism. PMID:26337877

  4. The amino acid sequence of cytochrome c-555 from the methane-oxidizing bacterium Methylococcus capsulatus.

    PubMed Central

    Ambler, R P; Dalton, H; Meyer, T E; Bartsch, R G; Kamen, M D

    1986-01-01

    The amino acid sequence of the cytochrome c-555 from the obligate methanotroph Methylococcus capsulatus strain Bath (N.C.I.B. 11132) was determined. It is a single polypeptide chain of 96 residues, binding a haem group through the cysteine residues at positions 19 and 22, and the only methionine residue is a position 59. The sequence does not closely resemble that of any other cytochrome c that has yet been characterized. Detailed evidence for the amino acid sequence of the protein has been deposited as Supplementary Publication SUP 50131 (12 pages) at the British Library Lending Division, Boston Spa, West Yorkshire LS23 7BQ, U.K., from whom copies are available on prepayment. PMID:3006666

  5. Integration of Expressed Sequence Tag Data Flanking Predicted RNA Secondary Structures Facilitates Novel Non-Coding RNA Discovery

    PubMed Central

    Krzyzanowski, Paul M.; Price, Feodor D.; Muro, Enrique M.; Rudnicki, Michael A.; Andrade-Navarro, Miguel A.

    2011-01-01

    Many computational methods have been used to predict novel non-coding RNAs (ncRNAs), but none, to our knowledge, have explicitly investigated the impact of integrating existing cDNA-based Expressed Sequence Tag (EST) data that flank structural RNA predictions. To determine whether flanking EST data can assist in microRNA (miRNA) prediction, we identified genomic sites encoding putative miRNAs by combining functional RNA predictions with flanking ESTs data in a model consistent with miRNAs undergoing cleavage during maturation. In both human and mouse genomes, we observed that the inclusion of flanking ESTs adjacent to and not overlapping predicted miRNAs significantly improved the performance of various methods of miRNA prediction, including direct high-throughput sequencing of small RNA libraries. We analyzed the expression of hundreds of miRNAs predicted to be expressed during myogenic differentiation using a customized microarray and identified several known and predicted myogenic miRNA hairpins. Our results indicate that integrating ESTs flanking structural RNA predictions improves the quality of cleaved miRNA predictions and suggest that this strategy can be used to predict other non-coding RNAs undergoing cleavage during maturation. PMID:21698286

  6. Integration of expressed sequence tag data flanking predicted RNA secondary structures facilitates novel non-coding RNA discovery.

    PubMed

    Krzyzanowski, Paul M; Price, Feodor D; Muro, Enrique M; Rudnicki, Michael A; Andrade-Navarro, Miguel A

    2011-01-01

    Many computational methods have been used to predict novel non-coding RNAs (ncRNAs), but none, to our knowledge, have explicitly investigated the impact of integrating existing cDNA-based Expressed Sequence Tag (EST) data that flank structural RNA predictions. To determine whether flanking EST data can assist in microRNA (miRNA) prediction, we identified genomic sites encoding putative miRNAs by combining functional RNA predictions with flanking ESTs data in a model consistent with miRNAs undergoing cleavage during maturation. In both human and mouse genomes, we observed that the inclusion of flanking ESTs adjacent to and not overlapping predicted miRNAs significantly improved the performance of various methods of miRNA prediction, including direct high-throughput sequencing of small RNA libraries. We analyzed the expression of hundreds of miRNAs predicted to be expressed during myogenic differentiation using a customized microarray and identified several known and predicted myogenic miRNA hairpins. Our results indicate that integrating ESTs flanking structural RNA predictions improves the quality of cleaved miRNA predictions and suggest that this strategy can be used to predict other non-coding RNAs undergoing cleavage during maturation. PMID:21698286

  7. Theβ-sheets of proteins, the biosynthetic relationships between amino acids, and the origin of the genetic code

    NASA Astrophysics Data System (ADS)

    di Giulio, Massimo

    1996-12-01

    Two forces are generally hypothesised as being responsible for conditioning the origin of the organization of the genetic code: the physicochemical properties of amino acids and their biosynthetic relationships (relationships between precursor and product amino acids). If we assume that the biosynthetic relationships between amino acids were fundamental in defining the genetic code, then it is reasonable to expect that the distribution of physicochemical properties among the amino acids in precursor-product relationships cannot be random but must, rather, be affected by some selective constraints imposed by the structure of primitive proteins. Analysis shows that measurements representing the ‘size’ of amino acids, e.g. bulkiness, are specifically associated to the pairs of amino acids in precursor-product relationships. However, the size of amino acids cannot have been selected per se but, rather, because it reflects theβ-sheets of proteins which are, therefore, identified as the main adaptive theme promoting the origin of genetic code organization. Whereas there are no traces of theα-helix in the genetic code table. The above considerations make it necessary to re-examine the relationship linking the hydrophilicity of the dinucleoside monophosphates of anticodons and the polarity and bulkiness of amino acids. It can be concluded that this relationship seems to be meaningful only between the hydrophilicity of anticodons and the polarity of amino acids. The latter relationship is supposed to have been operative on hairpin structures, ancestors of the tRNA molecule. Moreover, it is on these very structures that the biosynthetic links between precursor and product amino acids might have been achieved, and the interaction between the hydrophilicity of anticodons and the polarity of amino acids might have had a role in the concession of codons (anticodons) from precursors to products.

  8. Complete Coding Sequences of One H9 and Three H7 Low-Pathogenic Influenza Viruses Circulating in Wild Birds in Belgium, 2009 to 2012

    PubMed Central

    Rosseel, Toon; Marché, Sylvie; Steensels, Mieke; Vangeluwe, Didier; Linden, Annick; van den Berg, Thierry; Lambrecht, Bénédicte

    2016-01-01

    The complete coding sequences of four avian influenza A viruses (two H7N7, one H7N1, and one H9N2) circulating in wild waterfowl in Belgium from 2009 to 2012 were determined using Illumina sequencing. All viral genome segments represent viruses circulating in the Eurasian wild bird population. PMID:27284153

  9. Complete Coding Sequences of One H9 and Three H7 Low-Pathogenic Influenza Viruses Circulating in Wild Birds in Belgium, 2009 to 2012.

    PubMed

    Van Borm, Steven; Rosseel, Toon; Marché, Sylvie; Steensels, Mieke; Vangeluwe, Didier; Linden, Annick; van den Berg, Thierry; Lambrecht, Bénédicte

    2016-01-01

    The complete coding sequences of four avian influenza A viruses (two H7N7, one H7N1, and one H9N2) circulating in wild waterfowl in Belgium from 2009 to 2012 were determined using Illumina sequencing. All viral genome segments represent viruses circulating in the Eurasian wild bird population. PMID:27284153

  10. Allelic polymorphism in arabian camel ribonuclease and the amino acid sequence of bactrian camel ribonuclease.

    PubMed

    Welling, G W; Mulder, H; Beintema, J J

    1976-04-01

    Pancreatic ribonucleases from several species (whitetail deer, roe deer, guinea pig, and arabian camel) exhibit more than one amino acid at particular positions in their amino acid sequences. Since these enzymes were isolated from pooled pancreas, the origin of this heterogeneity is not clear. The pancreatic ribonucleases from 11 individual arabian camels (Camelus dromedarius) have been investigated with respect to the lysine-glutamine heterogeneity at position 103 (Welling et al., 1975). Six ribonucleases showed only one basic band and five showed two bands after polyacrylamide gel electrophoresis, suggesting a gene frequency of about 0.75 for the Lys gene and about 0.25 for the Gln gene. The amino acid sequence of bactrian camel (Camelus bactrianus) ribonuclease isolated from individual pancreatic tissue was determined and compared with that of arabian camel ribonuclease. The only difference was observed at position 103. In the ribonucleases from two unrelated bactrian camels, only glutamine was observed at that position. PMID:962846

  11. Emergence and Evolution of Hominidae-Specific Coding and Noncoding Genomic Sequences

    PubMed Central

    Saber, Morteza Mahmoudi; Adeyemi Babarinde, Isaac; Hettiarachchi, Nilmini; Saitou, Naruya

    2016-01-01

    Family Hominidae, which includes humans and great apes, is recognized for unique complex social behavior and intellectual abilities. Despite the increasing genome data, however, the genomic origin of its phenotypic uniqueness has remained elusive. Clade-specific genes and highly conserved noncoding sequences (HCNSs) are among the high-potential evolutionary candidates involved in driving clade-specific characters and phenotypes. On this premise, we analyzed whole genome sequences along with gene orthology data retrieved from major DNA databases to find Hominidae-specific (HS) genes and HCNSs. We discovered that Down syndrome critical region 4 (DSCR4) is the only experimentally verified gene uniquely present in Hominidae. DSCR4 has no structural homology to any known protein and was inferred to have emerged in several steps through LTR/ERV1, LTR/ERVL retrotransposition, and transversion. Using the genomic distance as neutral evolution threshold, we identified 1,658 HS HCNSs. Polymorphism coverage and derived allele frequency analysis of HS HCNSs showed that these HCNSs are under purifying selection, indicating that they may harbor important functions. They are overrepresented in promoters/untranslated regions, in close proximity of genes involved in sensory perception of sound and developmental process, and also showed a significantly lower nucleosome occupancy probability. Interestingly, many ancestral sequences of the HS HCNSs showed very high evolutionary rates. This suggests that new functions emerged through some kind of positive selection, and then purifying selection started to operate to keep these functions. PMID:27289096

  12. Sequencing rare and common APOL1 coding variants to determine kidney disease risk.

    PubMed

    Limou, Sophie; Nelson, George W; Lecordier, Laurence; An, Ping; O'hUigin, Colm S; David, Victor A; Binns-Roemer, Elizabeth A; Guiblet, Wilfried M; Oleksyk, Taras K; Pays, Etienne; Kopp, Jeffrey B; Winkler, Cheryl A

    2015-10-01

    A third of African Americans with sporadic focal segmental glomerulosclerosis (FSGS) or HIV-associated nephropathy (HIVAN) do not carry APOL1 renal risk genotypes. This raises the possibility that other APOL1 variants may contribute to kidney disease. To address this question, we sequenced all APOL1 exons in 1437 Americans of African and European descent, including 464 patients with biopsy-proven FSGS/HIVAN. Testing for association with 33 common and rare variants with FSGS/HIVAN revealed no association independent of strong recessive G1 and G2 effects. Seeking additional variants that might have been under selection by pathogens and could represent candidates for kidney disease risk, we also sequenced an additional 1112 individuals representing 53 global populations. Except for G1 and G2, none of the 7 common codon-altering variants showed evidence of selection or could restore lysis against trypanosomes causing human African trypanosomiasis. Thus, only APOL1 G1 and G2 confer renal risk, and other common and rare APOL1 missense variants, including the archaic G3 haplotype, do not contribute to sporadic FSGS and HIVAN in the US population. Hence, in most potential clinical or screening applications, our study suggests that sequencing APOL1 exons is unlikely to bring additional information compared to genotyping only APOL1 G1 and G2 risk alleles. PMID:25993319

  13. Emergence and Evolution of Hominidae-Specific Coding and Noncoding Genomic Sequences.

    PubMed

    Saber, Morteza Mahmoudi; Adeyemi Babarinde, Isaac; Hettiarachchi, Nilmini; Saitou, Naruya

    2016-01-01

    Family Hominidae, which includes humans and great apes, is recognized for unique complex social behavior and intellectual abilities. Despite the increasing genome data, however, the genomic origin of its phenotypic uniqueness has remained elusive. Clade-specific genes and highly conserved noncoding sequences (HCNSs) are among the high-potential evolutionary candidates involved in driving clade-specific characters and phenotypes. On this premise, we analyzed whole genome sequences along with gene orthology data retrieved from major DNA databases to find Hominidae-specific (HS) genes and HCNSs. We discovered that Down syndrome critical region 4 (DSCR4) is the only experimentally verified gene uniquely present in Hominidae. DSCR4 has no structural homology to any known protein and was inferred to have emerged in several steps through LTR/ERV1, LTR/ERVL retrotransposition, and transversion. Using the genomic distance as neutral evolution threshold, we identified 1,658 HS HCNSs. Polymorphism coverage and derived allele frequency analysis of HS HCNSs showed that these HCNSs are under purifying selection, indicating that they may harbor important functions. They are overrepresented in promoters/untranslated regions, in close proximity of genes involved in sensory perception of sound and developmental process, and also showed a significantly lower nucleosome occupancy probability. Interestingly, many ancestral sequences of the HS HCNSs showed very high evolutionary rates. This suggests that new functions emerged through some kind of positive selection, and then purifying selection started to operate to keep these functions. PMID:27289096

  14. MiR-10a* up-regulates coxsackievirus B3 biosynthesis by targeting the 3D-coding sequence

    PubMed Central

    Tong, Lei; Lin, Lexun; Wu, Shuo; Guo, Zhiwei; Wang, Tianying; Qin, Ying; Wang, Ruixue; Zhong, Xiaoyan; Wu, Xia; Wang, Yan; Luan, Tian; Wang, Qiang; Li, Yunxia; Chen, Xiaofeng; Zhang, Fengmin; Zhao, Wenran; Zhong, Zhaohua

    2013-01-01

    MicroRNAs (miRNAs) are small non-coding RNAs that can posttranscriptionally regulate gene expression by targeting messenger RNAs. During miRNA biogenesis, the star strand (miRNA*) is generally degraded to a low level in the cells. However, certain miRNA* express abundantly and can be recruited into the silencing complex to regulate gene expression. Most miRNAs function as suppressive regulators on gene expression. Group B coxsackieviruses (CVB) are the major pathogens of human viral myocarditis and dilated cardiomyopathy. CVB genome is a positive-sense, single-stranded RNA. Our previous study shows that miR-342-5p can suppress CVB biogenesis by targeting its 2C-coding sequence. In this study, we found that the miR-10a duplex could significantly up-regulate the biosynthesis of CVB type 3 (CVB3). Further study showed that it was the miR-10a star strand (miR-10a*) that augmented CVB3 biosynthesis. Site-directed mutagenesis showed that the miR-10a* target was located in the nt6818–nt6941 sequence of the viral 3D-coding region. MiR-10a* was detectable in the cardiac tissues of suckling Balb/c mice, suggesting that miR-10a* may impact CVB3 replication during its cardiac infection. Taken together, these data for the first time show that miRNA* can positively modulate gene expression. MiR-10a* might be involved in the CVB3 cardiac pathogenesis. PMID:23389951

  15. Use of a structural alphabet to find compatible folds for amino acid sequences

    PubMed Central

    Mahajan, Swapnil; de Brevern, Alexandre G; Sanejouand, Yves-Henri; Srinivasan, Narayanaswamy; Offmann, Bernard

    2015-01-01

    The structural annotation of proteins with no detectable homologs of known 3D structure identified using sequence-search methods is a major challenge today. We propose an original method that computes the conditional probabilities for the amino-acid sequence of a protein to fit to known protein 3D structures using a structural alphabet, known as “Protein Blocks” (PBs). PBs constitute a library of 16 local structural prototypes that approximate every part of protein backbone structures. It is used to encode 3D protein structures into 1D PB sequences and to capture sequence to structure relationships. Our method relies on amino acid occurrence matrices, one for each PB, to score global and local threading of query amino acid sequences to protein folds encoded into PB sequences. It does not use any information from residue contacts or sequence-search methods or explicit incorporation of hydrophobic effect. The performance of the method was assessed with independent test datasets derived from SCOP 1.75A. With a Z-score cutoff that achieved 95% specificity (i.e., less than 5% false positives), global and local threading showed sensitivity of 64.1% and 34.2%, respectively. We further tested its performance on 57 difficult CASP10 targets that had no known homologs in PDB: 38 compatible templates were identified by our approach and 66% of these hits yielded correctly predicted structures. This method scales-up well and offers promising perspectives for structural annotations at genomic level. It has been implemented in the form of a web-server that is freely available at http://www.bo-protscience.fr/forsa. PMID:25297700

  16. Use of a structural alphabet to find compatible folds for amino acid sequences.

    PubMed

    Mahajan, Swapnil; de Brevern, Alexandre G; Sanejouand, Yves-Henri; Srinivasan, Narayanaswamy; Offmann, Bernard

    2015-01-01

    The structural annotation of proteins with no detectable homologs of known 3D structure identified using sequence-search methods is a major challenge today. We propose an original method that computes the conditional probabilities for the amino-acid sequence of a protein to fit to known protein 3D structures using a structural alphabet, known as "Protein Blocks" (PBs). PBs constitute a library of 16 local structural prototypes that approximate every part of protein backbone structures. It is used to encode 3D protein structures into 1D PB sequences and to capture sequence to structure relationships. Our method relies on amino acid occurrence matrices, one for each PB, to score global and local threading of query amino acid sequences to protein folds encoded into PB sequences. It does not use any information from residue contacts or sequence-search methods or explicit incorporation of hydrophobic effect. The performance of the method was assessed with independent test datasets derived from SCOP 1.75A. With a Z-score cutoff that achieved 95% specificity (i.e., less than 5% false positives), global and local threading showed sensitivity of 64.1% and 34.2%, respectively. We further tested its performance on 57 difficult CASP10 targets that had no known homologs in PDB: 38 compatible templates were identified by our approach and 66% of these hits yielded correctly predicted structures. This method scales-up well and offers promising perspectives for structural annotations at genomic level. It has been implemented in the form of a web-server that is freely available at http://www.bo-protscience.fr/forsa. PMID:25297700

  17. Structure of the genetic code suggested by the hydropathy correlation between anticodons and amino acid residues

    NASA Astrophysics Data System (ADS)

    Farias, Sávio Torres De; Moreira, Carlos Henrique Costa; Guimarães, Romeu Cardoso

    2007-02-01

    The correlation between hydropathies of anticodons and amino acids, detected by other authors utilizing scales of amino acid molecules in solution, was improved with the utilization of scales of amino acid residues in proteins. Three partitions were discerned in the correlation plot with the principal dinucleotides of anticodons (pDiN, excluding the wobble position). (a) The set of outliers of the correlation: Gly-CC, Pro-GG, Ser-GA and Ser-CU. The amino acids are consistently small, hydro-apathetic, stabilizers of protein N-ends, preferred in aperiodic protein conformations and belong to synthetases class II. The pDiN sequences are representative of the homogeneous sector (triplets N RR and N YY), distinguished from the mixed sector (triplets N RY and N YR), that depict a 70% correspondence to the synthetases class II and I, respectively. The triplet pairs proposed to be responsible for the coherence in the set of outliers are of the palindromic kind, where the lateral bases are the same, C CC: G GG and A GA: U CU. This suggests that U CU previously belonged to Ser, adding to other indications that the attribution of Arg to Y CU was due to an expansion of the Arg- tRNA synthetase specificity. The other attributions produced two correlation sets. (b) One corresponds to the remaining pDiN of the homogeneous sector, containing both synthetase classes; its regression line overlapped the one formed by the remaining attributions to class II. (c) The other contains the pDiN of the mixed sector and produced steeper slopes, especially with the class I attributions. It is suggested that the correlation was established when the amino acid composition of the protein synthetases became progressively enriched and that the set of outliers were the earliest to have been fixed.

  18. Acoustic radiation force impulse (ARFI) imaging of zebrafish embryo by high-frequency coded excitation sequence.

    PubMed

    Park, Jinhyoung; Lee, Jungwoo; Lau, Sien Ting; Lee, Changyang; Huang, Ying; Lien, Ching-Ling; Kirk Shung, K

    2012-04-01

    Acoustic radiation force impulse (ARFI) imaging has been developed as a non-invasive method for quantitative illustration of tissue stiffness or displacement. Conventional ARFI imaging (2-10 MHz) has been implemented in commercial scanners for illustrating elastic properties of several organs. The image resolution, however, is too coarse to study mechanical properties of micro-sized objects such as cells. This article thus presents a high-frequency coded excitation ARFI technique, with the ultimate goal of displaying elastic characteristics of cellular structures. Tissue mimicking phantoms and zebrafish embryos are imaged with a 100-MHz lithium niobate (LiNbO₃) transducer, by cross-correlating tracked RF echoes with the reference. The phantom results show that the contrast of ARFI image (14 dB) with coded excitation is better than that of the conventional ARFI image (9 dB). The depths of penetration are 2.6 and 2.2 mm, respectively. The stiffness data of the zebrafish demonstrate that the envelope is harder than the embryo region. The temporal displacement change at the embryo and the chorion is as large as 36 and 3.6 μm. Consequently, this high-frequency ARFI approach may serve as a remote palpation imaging tool that reveals viscoelastic properties of small biological samples. PMID:22101757

  19. A Unified Mathematical Framework for Coding Time, Space, and Sequences in the Hippocampal Region

    PubMed Central

    MacDonald, Christopher J.; Tiganj, Zoran; Shankar, Karthik H.; Du, Qian; Hasselmo, Michael E.; Eichenbaum, Howard

    2014-01-01

    The medial temporal lobe (MTL) is believed to support episodic memory, vivid recollection of a specific event situated in a particular place at a particular time. There is ample neurophysiological evidence that the MTL computes location in allocentric space and more recent evidence that the MTL also codes for time. Space and time represent a similar computational challenge; both are variables that cannot be simply calculated from the immediately available sensory information. We introduce a simple mathematical framework that computes functions of both spatial location and time as special cases of a more general computation. In this framework, experience unfolding in time is encoded via a set of leaky integrators. These leaky integrators encode the Laplace transform of their input. The information contained in the transform can be recovered using an approximation to the inverse Laplace transform. In the temporal domain, the resulting representation reconstructs the temporal history. By integrating movements, the equations give rise to a representation of the path taken to arrive at the present location. By modulating the transform with information about allocentric velocity, the equations code for position of a landmark. Simulated cells show a close correspondence to neurons observed in various regions for all three cases. In the temporal domain, novel secondary analyses of hippocampal time cells verified several qualitative predictions of the model. An integrated representation of spatiotemporal context can be computed by taking conjunctions of these elemental inputs, leading to a correspondence with conjunctive neural representations observed in dorsal CA1. PMID:24672015

  20. Exome-wide association analysis reveals novel coding sequence variants associated with lipid traits in Chinese.

    PubMed

    Tang, Clara S; Zhang, He; Cheung, Chloe Y Y; Xu, Ming; Ho, Jenny C Y; Zhou, Wei; Cherny, Stacey S; Zhang, Yan; Holmen, Oddgeir; Au, Ka-Wing; Yu, Haiyi; Xu, Lin; Jia, Jia; Porsch, Robert M; Sun, Lijie; Xu, Weixian; Zheng, Huiping; Wong, Lai-Yung; Mu, Yiming; Dou, Jingtao; Fong, Carol H Y; Wang, Shuyu; Hong, Xueyu; Dong, Liguang; Liao, Yanhua; Wang, Jiansong; Lam, Levina S M; Su, Xi; Yan, Hua; Yang, Min-Lee; Chen, Jin; Siu, Chung-Wah; Xie, Gaoqiang; Woo, Yu-Cho; Wu, Yangfeng; Tan, Kathryn C B; Hveem, Kristian; Cheung, Bernard M Y; Zöllner, Sebastian; Xu, Aimin; Eugene Chen, Y; Jiang, Chao Qiang; Zhang, Youyi; Lam, Tai-Hing; Ganesh, Santhi K; Huo, Yong; Sham, Pak C; Lam, Karen S L; Willer, Cristen J; Tse, Hung-Fat; Gao, Wei

    2015-01-01

    Blood lipids are important risk factors for coronary artery disease (CAD). Here we perform an exome-wide association study by genotyping 12,685 Chinese, using a custom Illumina HumanExome BeadChip, to identify additional loci influencing lipid levels. Single-variant association analysis on 65,671 single nucleotide polymorphisms reveals 19 loci associated with lipids at exome-wide significance (P<2.69 × 10(-7)), including three Asian-specific coding variants in known genes (CETP p.Asp459Gly, PCSK9 p.Arg93Cys and LDLR p.Arg257Trp). Furthermore, missense variants at two novel loci-PNPLA3 p.Ile148Met and PKD1L3 p.Thr429Ser-also influence levels of triglycerides and low-density lipoprotein cholesterol, respectively. Another novel gene, TEAD2, is found to be associated with high-density lipoprotein cholesterol through gene-based association analysis. Most of these newly identified coding variants show suggestive association (P<0.05) with CAD. These findings demonstrate that exome-wide genotyping on samples of non-European ancestry can identify additional population-specific possible causal variants, shedding light on novel lipid biology and CAD. PMID:26690388

  1. Non-Coding RNA: Sequence-Specific Guide for Chromatin Modification and DNA Damage Signaling

    PubMed Central

    Francia, Sofia

    2015-01-01

    Chromatin conformation shapes the environment in which our genome is transcribed into RNA. Transcription is a source of DNA damage, thus it often occurs concomitantly to DNA damage signaling. Growing amounts of evidence suggest that different types of RNAs can, independently from their protein-coding properties, directly affect chromatin conformation, transcription and splicing, as well as promote the activation of the DNA damage response (DDR) and DNA repair. Therefore, transcription paradoxically functions to both threaten and safeguard genome integrity. On the other hand, DNA damage signaling is known to modulate chromatin to suppress transcription of the surrounding genetic unit. It is thus intriguing to understand how transcription can modulate DDR signaling while, in turn, DDR signaling represses transcription of chromatin around the DNA lesion. An unexpected player in this field is the RNA interference (RNAi) machinery, which play roles in transcription, splicing and chromatin modulation in several organisms. Non-coding RNAs (ncRNAs) and several protein factors involved in the RNAi pathway are well known master regulators of chromatin while only recent reports show their involvement in DDR. Here, we discuss the experimental evidence supporting the idea that ncRNAs act at the genomic loci from which they are transcribed to modulate chromatin, DDR signaling and DNA repair. PMID:26617633

  2. Exome-wide association analysis reveals novel coding sequence variants associated with lipid traits in Chinese

    PubMed Central

    Tang, Clara S.; Zhang, He; Cheung, Chloe Y. Y.; Xu, Ming; Ho, Jenny C. Y.; Zhou, Wei; Cherny, Stacey S.; Zhang, Yan; Holmen, Oddgeir; Au, Ka-Wing; Yu, Haiyi; Xu, Lin; Jia, Jia; Porsch, Robert M.; Sun, Lijie; Xu, Weixian; Zheng, Huiping; Wong, Lai-Yung; Mu, Yiming; Dou, Jingtao; Fong, Carol H. Y.; Wang, Shuyu; Hong, Xueyu; Dong, Liguang; Liao, Yanhua; Wang, Jiansong; Lam, Levina S. M.; Su, Xi; Yan, Hua; Yang, Min-Lee; Chen, Jin; Siu, Chung-Wah; Xie, Gaoqiang; Woo, Yu-Cho; Wu, Yangfeng; Tan, Kathryn C. B.; Hveem, Kristian; Cheung, Bernard M. Y.; Zöllner, Sebastian; Xu, Aimin; Eugene Chen, Y; Jiang, Chao Qiang; Zhang, Youyi; Lam, Tai-Hing; Ganesh, Santhi K.; Huo, Yong; Sham, Pak C.; Lam, Karen S. L.; Willer, Cristen J.; Tse, Hung-Fat; Gao, Wei

    2015-01-01

    Blood lipids are important risk factors for coronary artery disease (CAD). Here we perform an exome-wide association study by genotyping 12,685 Chinese, using a custom Illumina HumanExome BeadChip, to identify additional loci influencing lipid levels. Single-variant association analysis on 65,671 single nucleotide polymorphisms reveals 19 loci associated with lipids at exome-wide significance (P<2.69 × 10−7), including three Asian-specific coding variants in known genes (CETP p.Asp459Gly, PCSK9 p.Arg93Cys and LDLR p.Arg257Trp). Furthermore, missense variants at two novel loci—PNPLA3 p.Ile148Met and PKD1L3 p.Thr429Ser—also influence levels of triglycerides and low-density lipoprotein cholesterol, respectively. Another novel gene, TEAD2, is found to be associated with high-density lipoprotein cholesterol through gene-based association analysis. Most of these newly identified coding variants show suggestive association (P<0.05) with CAD. These findings demonstrate that exome-wide genotyping on samples of non-European ancestry can identify additional population-specific possible causal variants, shedding light on novel lipid biology and CAD. PMID:26690388

  3. Software scripts for quality checking of high-throughput nucleic acid sequencers.

    PubMed

    Lazo, G R; Tong, J; Miller, R; Hsia, C; Rausch, C; Kang, Y; Anderson, O D

    2001-06-01

    We have developed a graphical interface to allow the researcher to view and assess the quality of sequencing results using a series of program scripts developed to process data generated by automated sequencers. The scripts are written in Perl programming language and are executable under the cgibin directory of a Web server environment. The scripts direct nucleic acid sequencing trace file data output from automated sequencers to be analyzed by the phred molecular biology program and are displayed as graphical hypertext mark-up language (HTML) pages. The scripts are mainly designed to handle 96-well microtiter dish samples, but the scripts are also able to read data from 384-well microtiter dishes 96 samples at a time. The scripts may be customized for different laboratory environments and computer configurations. Web links to the sources and discussion page are provided. PMID:11414222

  4. Indole-3-acetic acid: A widespread physiological code in interactions of fungi with other organisms

    PubMed Central

    Fu, Shih-Feng; Wei, Jyuan-Yu; Chen, Hung-Wei; Liu, Yen-Yu; Lu, Hsueh-Yu; Chou, Jui-Yu

    2015-01-01

    Plants as well as microorganisms, including bacteria and fungi, produce indole-3-acetic acid (IAA). IAA is the most common plant hormone of the auxin class and it regulates various aspects of plant growth and development. Thus, research is underway globally to exploit the potential for developing IAA-producing fungi for promoting plant growth and protection for sustainable agriculture. Phylogenetic evidence suggests that IAA biosynthesis evolved independently in bacteria, microalgae, fungi, and plants. Present studies show that IAA regulates the physiological response and gene expression in these microorganisms. The convergent evolution of IAA production leads to the hypothesis that natural selection might have favored IAA as a widespread physiological code in these microorganisms and their interactions. We summarize recent studies of IAA biosynthetic pathways and discuss the role of IAA in fungal ecology. PMID:26179718

  5. Identification of internal transcribed spacer sequence motifs in truffles: a first step toward their DNA bar coding.

    PubMed

    El Karkouri, Khalid; Murat, Claude; Zampieri, Elisa; Bonfante, Paola

    2007-08-01

    This work presents DNA sequence motifs from the internal transcribed spacer (ITS) of the nuclear rRNA repeat unit which are useful for the identification of five European and Asiatic truffles (Tuber magnatum, T. melanosporum, T. indicum, T. aestivum, and T. mesentericum). Truffles are edible mycorrhizal ascomycetes that show similar morphological characteristics but that have distinct organoleptic and economic values. A total of 36 out of 46 ITS1 or ITS2 sequence motifs have allowed an accurate in silico distinction of the five truffles to be made (i.e., by pattern matching and/or BLAST analysis on downloaded GenBank sequences and directly against GenBank databases). The motifs considered the intraspecific genetic variability of each species, including rare haplotypes, and assigned their respective species from either the ascocarps or ectomycorrhizas. The data indicate that short ITS1 or ITS2 motifs (< or = 50 bp in size) can be considered promising tools for truffle species identification. A dot blot hybridization analysis of T. magnatum and T. melanosporum compared with other close relatives or distant lineages allowed at least one highly specific motif to be identified for each species. These results were confirmed in a blind test which included new field isolates. The current work has provided a reliable new tool for a truffle oligonucleotide bar code and identification in ecological and evolutionary studies. PMID:17601808

  6. Efficient Nucleic Acid Extraction and 16S rRNA Gene Sequencing for Bacterial Community Characterization.

    PubMed

    Anahtar, Melis N; Bowman, Brittany A; Kwon, Douglas S

    2016-01-01

    There is a growing appreciation for the role of microbial communities as critical modulators of human health and disease. High throughput sequencing technologies have allowed for the rapid and efficient characterization of bacterial communities using 16S rRNA gene sequencing from a variety of sources. Although readily available tools for 16S rRNA sequence analysis have standardized computational workflows, sample processing for DNA extraction remains a continued source of variability across studies. Here we describe an efficient, robust, and cost effective method for extracting nucleic acid from swabs. We also delineate downstream methods for 16S rRNA gene sequencing, including generation of sequencing libraries, data quality control, and sequence analysis. The workflow can accommodate multiple samples types, including stool and swabs collected from a variety of anatomical locations and host species. Additionally, recovered DNA and RNA can be separated and used for other applications, including whole genome sequencing or RNA-seq. The method described allows for a common processing approach for multiple sample types and accommodates downstream analysis of genomic, metagenomic and transcriptional information. PMID:27168460

  7. Efficient Nucleic Acid Extraction and 16S rRNA Gene Sequencing for Bacterial Community Characterization

    PubMed Central

    Anahtar, Melis N.; Bowman, Brittany A.; Kwon, Douglas S.

    2016-01-01

    There is a growing appreciation for the role of microbial communities as critical modulators of human health and disease. High throughput sequencing technologies have allowed for the rapid and efficient characterization of bacterial communities using 16S rRNA gene sequencing from a variety of sources. Although readily available tools for 16S rRNA sequence analysis have standardized computational workflows, sample processing for DNA extraction remains a continued source of variability across studies. Here we describe an efficient, robust, and cost effective method for extracting nucleic acid from swabs. We also delineate downstream methods for 16S rRNA gene sequencing, including generation of sequencing libraries, data quality control, and sequence analysis. The workflow can accommodate multiple samples types, including stool and swabs collected from a variety of anatomical locations and host species. Additionally, recovered DNA and RNA can be separated and used for other applications, including whole genome sequencing or RNA-seq. The method described allows for a common processing approach for multiple sample types and accommodates downstream analysis of genomic, metagenomic and transcriptional information. PMID:27168460

  8. Preparation of Nucleic Acid Libraries for Personalized Sequencing Systems Using an Integrated Microfluidic Hub Technology (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    ScienceCinema

    Patel, Kamlesh D [Ken]; SNL,

    2013-01-25

    Kamlesh (Ken) Patel from Sandia National Laboratories (Livermore, California) presents "Preparation of Nucleic Acid Libraries for Personalized Sequencing Systems Using an Integrated Microfluidic Hub Technology " at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  9. Identification of small non-coding RNAs in the planarian Dugesia japonica via deep sequencing.

    PubMed

    Qin, Yun-Fei; Zhao, Jin-Mei; Bao, Zhen-Xia; Zhu, Zhao-Yu; Mai, Jia; Huang, Yi-Bo; Li, Jian-Biao; Chen, Ge; Lu, Ping; Chen, San-Jun; Su, Lin-Lin; Fang, Hui-Min; Lu, Ji-Ke; Zhang, Yi-Zhe; Zhang, Shou-Tao

    2012-05-01

    Freshwater planarian flatworm possesses an extraordinary ability to regenerate lost body parts after amputation; it is perfect organism model in regeneration and stem cell biology. Recently, small RNAs have been an increasing concern and studied in many aspects, including regeneration and stem cell biology, among others. In the current study, the large-scale cloning and sequencing of sRNAs from the intact and regenerative planarian Dugesia japonica are reported. Sequence analysis shows that sRNAs between 18nt and 40nt are mainly microRNAs and piRNAs. In addition, 209 conserved miRNAs and 12 novel miRNAs are identified. Especially, a better screening target method, negative-correlation relationship of miRNAs and mRNA, is adopted to improve target prediction accuracy. Similar to miRNAs, a diverse population of piRNAs and changes in the two samples are also listed. The present study is the first to report on the important role of sRNAs during planarian Dugesia japonica regeneration. PMID:22425900

  10. The amino acid sequence of ribonuclease U2 from Ustilago sphaerogena.

    PubMed Central

    Sato, S; Uchida, T

    1975-01-01

    1. RNAase (ribonuclease) U2, a purine-specific RNAase, was reduced, aminoethylated and hydrolysed with trypsin, chymotrypsin and thermolysin. On the basis of the analyses of the resulting peptides, the complete amino acid sequence of RNAase U2 was determined, 2. When the sequence was compared with the amino acid sequence of RNAase T1 (EC 3.1.4.8), the following regions were found to be similar in the two enzymes; Tyr-Pro-His-Gln-Tyr (38-42) in RNAase U2 and Tyr-Pro-His-Lys-Tyr (38-42) in RNAase T1, Glu-Phe-Pro-Leu-Val (61-65) in RNAase U2 and Glu-Trp-Pro-Ile-Leu (58-62) in RNAase T1, Asp-Arg-Val-Ile-Tyr-Gln (83-88) in RNAase U2 and Asp-Arg-Val-Phe-Asn (76-81) in RNAase T1 and Val-Thr-His-Thr-Gly-Ala (98-103) in RNAase U2 and Ile-Thr-His-Thr-Gly-Ala (90-95) in RNAase T1. All of the amino acid residues, histidine-40, glutamate-58, arginine-77 and histidine-92, which were found to play a crucial role in the biological activity of RNAase T1, were included in the regions cited here. 3. Detailed evidence for the amino acid sequence of the sequence of the proteins has been deposited as Supplementary Publication SUP 50041 (33 PAGES) AT THE British Library (Lending Division)(formerly the National Lending Library for Science and Technology), Boston Spa, Yorks. LS23 7BQ, U.K., from whom copies can be obtained on the terms indicated in Biochem. J. (1975), 145, 5. PMID:1156364

  11. Deduced amino acid sequence of human pulmonary surfactant proteolipid: SPL(pVal)

    SciTech Connect

    Whitsett, J.A.; Glasser, S.W.; Korfhagen, T.R.; Weaver, T.E.; Clark, J.; Pilot-Matias, T.; Meuth, J.; Fox, J.L.

    1987-05-01

    Hydrophobic, proteolipid-like protein of Mr 6500 was isolated from ether/ethanol extracts of human, canine and bovine pulmonary surfactant. Amino acid composition of the protein demonstrated a remarkable abundance of hydrophobic residues, particularly valine and leucine. The N-terminal amino acid sequence of the human protein was determined: N-Leu-Ile-Pro-Cys-Cys-Pro-Val-Asn-Leu-Lys-Arg-Leu-Leu-Ile-Val4... An oligonucleotide probe was used to screen an adult human lung cDNA library and resulted in detection of cDNA clones with predicted amino acid sequence with close identity to the N-terminal amino acid sequence of the human peptide. SPL(pVal) was found within the reading frame of a larger peptide. SPL(pVal) results from proteolytic processing of a larger preprotein. Northern blot analysis detected in a single 1.0 kilobase SPL(pVal) RNA which was less abundant in fetal than in adult lung. Mixtures of purified canine and bovine SPL(pVal) and synthetic phospholipids display properties of rapid adsorption and surface tension lowering activity characteristic of surfactant. Human SPL(pVal) is a pulmonary surfactant proteolipid which may therefore be useful in combination with phospholipids and/or other surfactant proteins for the treatment of surfactant deficiency such as hyaline membrane disease in newborn infants.

  12. Identification of the ovine KAP11-1 gene (KRTAP11-1) and genetic variation in its coding sequence.

    PubMed

    Gong, Hua; Zhou, Huitong; Dyer, Jolon M; Hickford, Jon G H

    2011-11-01

    Keratin-associated proteins (KAPs) are a structural component of the wool fibre and form the matrix between the keratin intermediate filaments (KIFs). The gene encoding high sulphur-protein KAP11-1 has been identified in human, cattle and mouse, but not yet in sheep, despite the economic importance of wool. In this study, PCR using primers based on the cattle KAP11-1 gene sequence produced an amplicon of the expected size with sheep DNA. Upon using PCR-Single Stranded Conformational Polymorphism (PCR-SSCP) analysis in 260 sheep, six different PCR-SSCP patterns were detected. Either one or a combination of two banding patterns was observed for each sheep, suggesting they were either homozygous or heterozygous for this gene. Sequencing of the amplicons confirmed the occurrence of six DNA sequences. All of these were unique, and the greatest homology was with KRTAP11-1 sequences from cattle, human and mouse, suggesting that they were derived from the ovine KAP11-1 gene and were allelic variants. The ovine KAP11-1 gene had an open reading frame of 477 nucleotides encoding 159 amino acids. The putative protein was rich in serine, cysteine, and threonine which account for 18.2-18.9, 12.6 and 12.0 mol%, respectively. Of these, approximately 20 of the serine and threonine residues might be phosphorylated. Five nucleotide substitutions were identified, and one was non-synonymous and would result in an amino acid change at a potential phosphorylation site. The genetic variation found in KRTAP11-1 may influence its expression, protein structure, and/or post-translational modifications, and consequently affect wool fibre structure and wool traits. PMID:21400094

  13. Human liver type pyruvate kinase: complete amino acid sequence and the expression in mammalian cells.

    PubMed Central

    Tani, K; Fujii, H; Nagata, S; Miwa, S

    1988-01-01

    Pyruvate kinase (PK) has four isozymes (L, R, M1, M2) that are encoded by two different genes. Among these isozymes, abnormalities of liver (L)-type PK is considered to be associated with hereditary nonspherocytic hemolytic anemia in humans. We isolated and determined the full-length sequence of human L-type PK cDNA. The cDNA contains 1629 base pairs encoding 543 amino acids, 68 base pairs of 5'-noncoding sequence, and 734 base pairs of 3'-noncoding sequence. The similarity between human and rat L-type PK was 86.9% at the nucleotide sequence level and 92.4% at the amino acid sequence level. The full-length L-type PK cDNA was placed under the promoter of simian virus 40 and introduced into monkey COS cells. Human L-type PK activity was detected in the extract of COS cells by the classical PK electrophoresis method. Images PMID:3126495

  14. Human liver type pyruvate kinase: Complete amino acid sequence and the expression in mammalian cells

    SciTech Connect

    Tani, Kenzaburo; Nagata, Shigekazu ); Fujii, Hisaichi ); Miwa, Shiro )

    1988-03-01

    Pyruvate kinase (PK) has four isozymes (L, R, M{sub 1}, M{sub 2}) that are encoded by two different genes. Among these isozymes, abnormalities of liver (L)-type PK is considered to be associated with hereditary nonspherocytic hemolytic anemia in humans. The authors isolated and determined the full-length sequence of human L-type PK cDNA. The cDNA contains 1,629 base pairs encoding 543 amino acids, 68 base pairs of 5{prime}-noncoding sequence, and 734 base pairs of 3{prime}-noncoding sequence. The similarity between human and rat L-type PK was 86.9% at the nucleotide sequence level and 92.4% at the amino acid sequence level. The full-length L-type PK cDNA was placed under the promoter of simian virus 40 and introduced into monkey COS cells. Human L-type PK activity was detected in the extract of COS cells by the classical PK electrophoresis method.

  15. Color differences among feral pigeons (Columba livia) are not attributable to sequence variation in the coding region of the melanocortin-1 receptor gene (MC1R)

    PubMed Central

    2013-01-01

    Background Genetic variation at the melanocortin-1 receptor (MC1R) gene is correlated with melanin color variation in many birds. Feral pigeons (Columba livia) show two major melanin-based colorations: a red coloration due to pheomelanic pigment and a black coloration due to eumelanic pigment. Furthermore, within each color type, feral pigeons display continuous variation in the amount of melanin pigment present in the feathers, with individuals varying from pure white to a full dark melanic color. Coloration is highly heritable and it has been suggested that it is under natural or sexual selection, or both. Our objective was to investigate whether MC1R allelic variants are associated with plumage color in feral pigeons. Findings We sequenced 888 bp of the coding sequence of MC1R among pigeons varying both in the type, eumelanin or pheomelanin, and the amount of melanin in their feathers. We detected 10 non-synonymous substitutions and 2 synonymous substitution but none of them were associated with a plumage type. It remains possible that non-synonymous substitutions that influence coloration are present in the short MC1R fragment that we did not sequence but this seems unlikely because we analyzed the entire functionally important region of the gene. Conclusions Our results show that color differences among feral pigeons are probably not attributable to amino acid variation at the MC1R locus. Therefore, variation in regulatory regions of MC1R or variation in other genes may be responsible for the color polymorphism of feral pigeons. PMID:23915680

  16. Molecular cytogenetics by polymerase catalyzed amplification or in situ labelling of specific nucleic acid sequences

    SciTech Connect

    Bolund, L.; Brandt, C.; Hindkjaer, J.; Koch, J.; Koelvraa, S.; Pedersen, S. )

    1993-01-01

    The Polymerase Chain Reaction (PCR) can be performed on isolated cells or chromosomes and the product can be analyzed by DNA technology or by FISH to test metaphases. The authors have good experiences analyzing aberrant chromosomes by FACS sorting, PCR with degenerated primers and painting of test metaphases with the PCR product. They also utilize polymerases for PRimed IN Situ labelling (PRINS) of specific nucleic acid sequences. In PRINS oligonucleotides are hybridized to their target sequences and labeled nucleotides are incorporated at the site of hybridization with the oligonucleotide as primer. PRINS may eventually allow the study of individual genes, gene expression and even somatic mutations (in mRNA) in single cells.

  17. DNA Cloning of Plasmodium falciparum Circumsporozoite Gene: Amino Acid Sequence of Repetitive Epitope

    NASA Astrophysics Data System (ADS)

    Enea, Vincenzo; Ellis, Joan; Zavala, Fidel; Arnot, David E.; Asavanich, Achara; Masuda, Aoi; Quakyi, Isabella; Nussenzweig, Ruth S.

    1984-08-01

    A clone of complementary DNA encoding the circumsporozoite (CS) protein of the human malaria parasite Plasmodium falciparum has been isolated by screening an Escherichia coli complementary DNA library with a monoclonal antibody to the CS protein. The DNA sequence of the complementary DNA insert encodes a four-amino acid sequence: proline-asparagine-alanine-asparagine, tandemly repeated 23 times. The CS β -lactamase fusion protein specifically binds monoclonal antibodies to the CS protein and inhibits the binding of these antibodies to native Plasmodium falciparum CS protein. These findings provide a basis for the development of a vaccine against Plasmodium falciparum malaria.

  18. Method for high-volume sequencing of nucleic acids: random and directed priming with libraries of oligonucleotides

    DOEpatents

    Studier, F.W.

    1995-04-18

    Random and directed priming methods for determining nucleotide sequences by enzymatic sequencing techniques, using libraries of primers of lengths 8, 9 or 10 bases, are disclosed. These methods permit direct sequencing of nucleic acids as large as 45,000 base pairs or larger without the necessity for subcloning. Individual primers are used repeatedly to prime sequence reactions in many different nucleic acid molecules. Libraries containing as few as 10,000 octamers, 14,200 nonamers, or 44,000 decamers would have the capacity to determine the sequence of almost any cosmid DNA. Random priming with a fixed set of primers from a smaller library can also be used to initiate the sequencing of individual nucleic acid molecules, with the sequence being completed by directed priming with primers from the library. In contrast to random cloning techniques, a combined random and directed priming strategy is far more efficient. 2 figs.

  19. Method for high-volume sequencing of nucleic acids: random and directed priming with libraries of oligonucleotides

    DOEpatents

    Studier, F. William

    1995-04-18

    Random and directed priming methods for determining nucleotide sequences by enzymatic sequencing techniques, using libraries of primers of lengths 8, 9 or 10 bases, are disclosed. These methods permit direct sequencing of nucleic acids as large as 45,000 base pairs or larger without the necessity for subcloning. Individual primers are used repeatedly to prime sequence reactions in many different nucleic acid molecules. Libraries containing as few as 10,000 octamers, 14,200 nonamers, or 44,000 decamers would have the capacity to determine the sequence of almost any cosmid DNA. Random priming with a fixed set of primers from a smaller library can also be used to initiate the sequencing of individual nucleic acid molecules, with the sequence being completed by directed priming with primers from the library. In contrast to random cloning techniques, a combined random and directed priming strategy is far more efficient.

  20. Partial amino acid sequence of apolipoprotein(a) shows that it is homologous to plasminogen

    SciTech Connect

    Eaton, D.L.; Fless, G.M.; Kohr, W.J.; McLean, J.W.; Xu, Q.T.; Miller, C.G.; Lawn, R.M.; Scanu, A.M.

    1987-05-01

    Apolipoprotein(a) (apo(a)) is a glycoprotein with M/sub r/ approx. 280,000 that is disulfide linked to apolipoprotein B in lipoprotein(a) particles. Elevated plasma levels of lipoprotein(a) are correlated with atherosclerosis. Partial amino acid sequence of apo(a) shows that it has striking homology to plasminogen. Plasminogen is a plasma serine protease zymogen that consists of five homologous and tandemly repeated domains called kringles and a trypsin-like protease domain. The amino-terminal sequence obtained for apo(a) is homologous to the beginning of kringle 4 but not the amino terminus of plasminogen. Apo(a) was subjected to limited proteolysis by trypsin or V8 protease, and fragments generated were isolated and sequenced. Sequences obtained from several of these fragments are highly (77-100%) homologous to plasminogen residues 391-421, which reside within kringle 4. Analysis of these internal apo(a) sequences revealed that apo(a) may contain at least two kringle 4-like domains. A sequence obtained from another tryptic fragment also shows homology to the end of kringle 4 and the beginning of kringle 5. Sequence data obtained from the two tryptic fragments shows homology with the protease domain of plasminogen. One of these sequences is homologous to the sequences surrounding the activation site of plasminogen. Plasminogen is activated by the cleavage of a specific arginine residue by urokinase and tissue plasminogen activator; however, the corresponding site in apo(a) is a serine that would not be cleaved by tissue plasminogen activator or urokinase. Using a plasmin-specific assay, no proteolytic activity could be demonstrated for lipoprotein(a) particles. These results suggest that apo(a) contains kringle-like domains and an inactive protease domain.

  1. Most Used Codons per Amino Acid and per Genome in the Code of Man Compared to Other Organisms According to the Rotating Circular Genetic Code

    PubMed Central

    Castro-Chavez, Fernando

    2011-01-01

    My previous theoretical research shows that the rotating circular genetic code is a viable tool to make easier to distinguish the rules of variation applied to the amino acid exchange; it presents a precise and positional bio-mathematical balance of codons, according to the amino acids they codify. Here, I demonstrate that when using the conventional or classic circular genetic code, a clearer pattern for the human codon usage per amino acid and per genome emerges. The most used human codons per amino acid were the ones ending with the three hydrogen bond nucleotides: C for 12 amino acids and G for the remaining 8, plus one codon for arginine ending in A that was used approximately with the same frequency than the one ending in G for this same amino acid (plus *). The most used codons in man fall almost all the time at the rightmost position, clockwise, ending either in C or in G within the circular genetic code. The human codon usage per genome is compared to other organisms such as fruit flies (Drosophila melanogaster), squid (Loligo pealei), and many others. The biosemiotic codon usage of each genomic population or ‘Theme’ is equated to a ‘molecular language’. The C/U choice or difference, and the G/A difference in the third nucleotide of the most used codons per amino acid are illustrated by comparing the most used codons per genome in humans and squids. The human distribution in the third position of most used codons is a 12-8-2, C-G-A, nucleotide ending signature, while the squid distribution in the third position of most used codons was an odd, or uneven, distribution in the third position of its most used codons: 13-6-3, U-A-G, as its nucleotide ending signature. These findings may help to design computational tools to compare human genomes, to determine the exchangeability between compatible codons and amino acids, and for the early detection of incompatible changes leading to hereditary diseases. PMID:22997484

  2. On human disease-causing amino acid variants: statistical study of sequence and structural patterns

    PubMed Central

    Alexov, Emil

    2015-01-01

    Statistical analysis was carried out on large set of naturally occurring human amino acid variations and it was demonstrated that there is a preference for some amino acid substitutions to be associated with diseases. At an amino acid sequence level, it was shown that the disease-causing variants frequently involve drastic changes of amino acid physico-chemical properties of proteins such as charge, hydrophobicity and geometry. Structural analysis of variants involved in diseases and being frequently observed in human population showed similar trends: disease-causing variants tend to cause more changes of hydrogen bond network and salt bridges as compared with harmless amino acid mutations. Analysis of thermodynamics data reported in literature, both experimental and computational, indicated that disease-causing variants tend to destabilize proteins and their interactions, which prompted us to investigate the effects of amino acid mutations on large databases of experimentally measured energy changes in unrelated proteins. Although the experimental datasets were linked neither to diseases nor exclusory to human proteins, the observed trends were the same: amino acid mutations tend to destabilize proteins and their interactions. Having in mind that structural and thermodynamics properties are interrelated, it is pointed out that any large change of any of them is anticipated to cause a disease. PMID:25689729

  3. Self-sequencing of amino acids and origins of polyfunctional protocells.

    PubMed

    Fox, S W

    1984-01-01

    The primal role of the origins of proteins in molecular evolution is discussed. On the basis of this premise, the significance of the experimentally established self-sequencing of amino acids under simulated geological conditions is explained as due to the fact that the products are highly nonrandom and accordingly contain many kinds of information. When such thermal proteins are aggregated into laboratory protocells, an action that occurs readily, the resultant protocells also contain many kinds of information. Residue-by-residue order, enzymic activities, and lipid quality accordingly occur within each preparation of proteinoid (thermal protein). In this paper are reviewed briefly the phenomenon of self-sequencing of amino acids, its relationship to evolutionary processes, other significance of such self-ordering, and the experimental evidence for original polyfunctional protocells. PMID:6462684

  4. Self-Sequencing of Amino Acids and Origins of Polyfunctional Protocells

    NASA Astrophysics Data System (ADS)

    Fox, Sidney W.

    1984-12-01

    The primal role of the origins of proteins in molecular evolution is discussed. On the basis of this premise, the significance of the experimentally established self-sequencing of amino acids under simulated geological conditions is explained as due to the fact that the products are highly nonrandom and accordingly contain many kinds of information. When such thermal proteins are aggregated into laboratory protocells, an action that occurs readily, the resultant protocells also contain many kinds of information. Residue-by-residue order, enzymic activities, and lipid quality accordingly occur within each preparation of proteinoid (thermal protein). In this paper are reviewed briefly the phenomenon of self-sequencing of amino acids, its relationship to evolutionary processes, other significance of such self-ordering, and the experimental evidence for original polyfunctional protocells.

  5. A base-sequence-modulated Golay code improves the excitation and measurement of ultrasonic guided waves in long bones.

    PubMed

    Song, Xiaojun; Ta, Dean; Wang, Weiqi

    2012-11-01

    Researchers are interested in using ultrasonic guided waves (GWs) to assess long bones. However, GWs suffer high attenuation when they propagate in long bones, resulting in a low SNR. To overcome this limitation, this paper introduces a base-sequence-modulated Golay code (BSGC) to produce larger amplitude and improve the SNR in the ultrasound evaluation of long bones. A 16-bit Golay code was used for excitation in computer simulation. The decoded GWs and the traditional GWs, which were generated by a single pulse, agreed well after decoding the received signals, and the SNR was improved by 26.12 dB. In the experiments using bovine bones, the BSGC excitation produced the amplitudes which were at least 237 times greater than those produced by a single pulse excitation. The BSGC excitation also allowed the GWs to be received over a longer distance between two transducers. The results suggest the BSGC excitation has the potential to measure GWs and assess long bones. PMID:23192823

  6. Strong conservation of non-coding sequences during vertebrates evolution: potential involvement in post-transcriptional regulation of gene expression.

    PubMed Central

    Duret, L; Dorkeld, F; Gautier, C

    1993-01-01

    Comparison of nucleotide sequences from different classes of vertebrates that diverged more than 300 million years ago, revealed the existence of highly conserved regions (HCRs) with more than 70% similarity over 100 to 1450 nt in non-coding parts of genes. Such a conservation is unexpected because it is much longer and stronger than what is necessary for specifying the binding of a regulatory protein. HCRs are relatively frequent, particularly in genes that are essential to cell life. In multigene families, conserved regions are specific of each isotype and are probably involved in the control of their specific pattern of expression. Studying HCRs distribution within genes showed that functional constraints are generally much stronger in 3'-non-coding regions than in promoters or introns. The 3'-HCRs are particularly A + T-rich and are always located in the transcribed untranslated regions of genes, which suggests that they are involved in post-transcriptional processes. However, current knowledge of mechanisms that regulate mRNA export, localisation, translation, or degradation is not sufficient to explain the strong functional constraints that we have characterised. PMID:8506129

  7. Enhanced Gene Expression Rather than Natural Polymorphism in Coding Sequence of the OsbZIP23 Determines Drought Tolerance and Yield Improvement in Rice Genotypes

    PubMed Central

    Dey, Avishek; Samanta, Milan Kumar; Gayen, Srimonta; Sen, Soumitra K.; Maiti, Mrinal K.

    2016-01-01

    Drought is one of the major limiting factors for productivity of crops including rice (Oryza sativa L.). Understanding the role of allelic variations of key regulatory genes involved in stress-tolerance is essential for developing an effective strategy to combat drought. The bZIP transcription factors play a crucial role in abiotic-stress adaptation in plants via abscisic acid (ABA) signaling pathway. The present study aimed to search for allelic polymorphism in the OsbZIP23 gene across selected drought-tolerant and drought-sensitive rice genotypes, and to characterize the new allele through overexpression (OE) and gene-silencing (RNAi). Analyses of the coding DNA sequence (CDS) of the cloned OsbZIP23 gene revealed single nucleotide polymorphism at four places and a 15-nucleotide deletion at one place. The single-copy OsbZIP23 gene is expressed at relatively higher level in leaf tissues of drought-tolerant genotypes, and its abundance is more in reproductive stage. Cloning and sequence analyses of the OsbZIP23-promoter from drought-tolerant O. rufipogon and drought-sensitive IR20 cultivar showed variation in the number of stress-responsive cis-elements and a 35-nucleotide deletion at 5’-UTR in IR20. Analysis of the GFP reporter gene function revealed that the promoter activity of O. rufipogon is comparatively higher than that of IR20. The overexpression of any of the two polymorphic forms (1083 bp and 1068 bp CDS) of OsbZIP23 improved drought tolerance and yield-related traits significantly by retaining higher content of cellular water, soluble sugar and proline; and exhibited decrease in membrane lipid peroxidation in comparison to RNAi lines and non-transgenic plants. The OE lines showed higher expression of target genes-OsRab16B, OsRab21 and OsLEA3-1 and increased ABA sensitivity; indicating that OsbZIP23 is a positive transcriptional-regulator of the ABA-signaling pathway. Taken together, the present study concludes that the enhanced gene expression rather

  8. Enhanced Gene Expression Rather than Natural Polymorphism in Coding Sequence of the OsbZIP23 Determines Drought Tolerance and Yield Improvement in Rice Genotypes.

    PubMed

    Dey, Avishek; Samanta, Milan Kumar; Gayen, Srimonta; Sen, Soumitra K; Maiti, Mrinal K

    2016-01-01

    Drought is one of the major limiting factors for productivity of crops including rice (Oryza sativa L.). Understanding the role of allelic variations of key regulatory genes involved in stress-tolerance is essential for developing an effective strategy to combat drought. The bZIP transcription factors play a crucial role in abiotic-stress adaptation in plants via abscisic acid (ABA) signaling pathway. The present study aimed to search for allelic polymorphism in the OsbZIP23 gene across selected drought-tolerant and drought-sensitive rice genotypes, and to characterize the new allele through overexpression (OE) and gene-silencing (RNAi). Analyses of the coding DNA sequence (CDS) of the cloned OsbZIP23 gene revealed single nucleotide polymorphism at four places and a 15-nucleotide deletion at one place. The single-copy OsbZIP23 gene is expressed at relatively higher level in leaf tissues of drought-tolerant genotypes, and its abundance is more in reproductive stage. Cloning and sequence analyses of the OsbZIP23-promoter from drought-tolerant O. rufipogon and drought-sensitive IR20 cultivar showed variation in the number of stress-responsive cis-elements and a 35-nucleotide deletion at 5'-UTR in IR20. Analysis of the GFP reporter gene function revealed that the promoter activity of O. rufipogon is comparatively higher than that of IR20. The overexpression of any of the two polymorphic forms (1083 bp and 1068 bp CDS) of OsbZIP23 improved drought tolerance and yield-related traits significantly by retaining higher content of cellular water, soluble sugar and proline; and exhibited decrease in membrane lipid peroxidation in comparison to RNAi lines and non-transgenic plants. The OE lines showed higher expression of target genes-OsRab16B, OsRab21 and OsLEA3-1 and increased ABA sensitivity; indicating that OsbZIP23 is a positive transcriptional-regulator of the ABA-signaling pathway. Taken together, the present study concludes that the enhanced gene expression rather than

  9. The non-coding RNA composition of the mitotic chromosome by 5′-tag sequencing

    PubMed Central

    Meng, Yicong; Yi, Xianfu; Li, Xinhui; Hu, Chuansheng; Wang, Ju; Bai, Ling; Czajkowsky, Daniel M.; Shao, Zhifeng

    2016-01-01

    Mitotic chromosomes are one of the most commonly recognized sub-cellular structures in eukaryotic cells. Yet basic information necessary to understand their structure and assembly, such as their composition, is still lacking. Recent proteomic studies have begun to fill this void, identifying hundreds of RNA-binding proteins bound to mitotic chromosomes. However, by contrast, there are only two RNA species (U3 snRNA and rRNA) that are known to be associated with the mitotic chromosome, suggesting that there are many mitotic chromosome-associated RNAs (mCARs) not yet identified. Here, using a targeted protocol based on 5′-tag sequencing to profile the mammalian mCAR population, we report the identification of 1279 mCARs, the majority of which are ncRNAs, including lncRNAs that exhibit greater conservation across 60 vertebrate species than the entire population of lncRNAs. There is also a significant enrichment of snoRNAs and specific SINE RNAs. Finally, ∼40% of the mCARs are presently unannotated, many of which are as abundant as the annotated mCARs, suggesting that there are also many novel ncRNAs in the mCARs. Overall, the mCARs identified here, together with the previous proteomic and genomic data, constitute the first comprehensive catalogue of the molecular composition of the eukaryotic mitotic chromosomes. PMID:27016738

  10. Bipartite geminivirus host adaptation determined cooperatively by coding and noncoding sequences of the genome.

    PubMed

    Petty, I T; Carter, S C; Morra, M R; Jeffrey, J L; Olivey, H E

    2000-11-25

    Bipartite geminiviruses are small, plant-infecting viruses with genomes composed of circular, single-stranded DNA molecules, designated A and B. Although they are closely related genetically, individual bipartite geminiviruses frequently exhibit host-specific adaptation. Two such viruses are bean golden mosaic virus (BGMV) and tomato golden mosaic virus (TGMV), which are well adapted to common bean (Phaseolus vulgaris) and Nicotiana benthamiana, respectively. In previous studies, partial host adaptation was conferred on BGMV-based or TGMV-based hybrid viruses by separately exchanging open reading frames (ORFs) on DNA A or DNA B. Here we analyzed hybrid viruses in which all of the ORFs on both DNAs were exchanged except for AL1, which encodes a protein with strictly virus-specific activity. These hybrid viruses exhibited partial transfer of host-adapted phenotypes. In contrast, exchange of noncoding regions (NCRs) upstream from the AR1 and BR1 ORFs did not confer any host-specific gain of function on hybrid viruses. However, when the exchangeable ORFs and NCRs from TGMV were combined in a single BGMV-based hybrid virus, complete transfer of TGMV-like adaptation to N. benthamiana was achieved. Interestingly, the reciprocal TGMV-based hybrid virus displayed only partial gain of function in bean. This may be, in part, the result of defective virus-specific interactions between TGMV and BGMV sequences present in the hybrid, although a potential role in adaptation to bean for additional regions of the BGMV genome cannot be ruled out. PMID:11080490

  11. The non-coding RNA composition of the mitotic chromosome by 5'-tag sequencing.

    PubMed

    Meng, Yicong; Yi, Xianfu; Li, Xinhui; Hu, Chuansheng; Wang, Ju; Bai, Ling; Czajkowsky, Daniel M; Shao, Zhifeng

    2016-06-01

    Mitotic chromosomes are one of the most commonly recognized sub-cellular structures in eukaryotic cells. Yet basic information necessary to understand their structure and assembly, such as their composition, is still lacking. Recent proteomic studies have begun to fill this void, identifying hundreds of RNA-binding proteins bound to mitotic chromosomes. However, by contrast, there are only two RNA species (U3 snRNA and rRNA) that are known to be associated with the mitotic chromosome, suggesting that there are many mitotic chromosome-associated RNAs (mCARs) not yet identified. Here, using a targeted protocol based on 5'-tag sequencing to profile the mammalian mCAR population, we report the identification of 1279 mCARs, the majority of which are ncRNAs, including lncRNAs that exhibit greater conservation across 60 vertebrate species than the entire population of lncRNAs. There is also a significant enrichment of snoRNAs and specific SINE RNAs. Finally, ∼40% of the mCARs are presently unannotated, many of which are as abundant as the annotated mCARs, suggesting that there are also many novel ncRNAs in the mCARs. Overall, the mCARs identified here, together with the previous proteomic and genomic data, constitute the first comprehensive catalogue of the molecular composition of the eukaryotic mitotic chromosomes. PMID:27016738

  12. Sequence of morphological transitions in two-dimensional pattern growth from aqueous ascorbic Acid solutions.

    PubMed

    Paranjpe, A S

    2002-08-12

    A sequence of morphological transitions in two-dimensional dehydration patterns of aqueous solutions of ascorbic acid is observed with humidity as a control parameter. Change in morphology occurs due to humidity induced variation in the concentration of the metastable supersaturated solution phase formed after initial solvent evaporation. As percent humidity is varied from 40 to 80, patterns change from compact circular --> radial --> density modulated radial (a new morphology) --> density modulated circular --> density modulated dendritic (a new morphology) --> dense branching. PMID:12190528

  13. Self-sequencing of amino acids and origins of polyfunctional protocells

    NASA Technical Reports Server (NTRS)

    Fox, S. W.

    1984-01-01

    The role of proteins in the origin of living things is discussed. It has been experimentally established that amino acids can sequence themselves under simulated geological conditions with highly nonrandom products which accordingly contain diverse information. Multiple copies of each type of macromolecule are formed, resulting in greater power for any protoenzymic molecule than would accrue from a single copy of each type. Thermal proteins are readily incorporated into laboratory protocells. The experimental evidence for original polyfunctional protocells is discussed.

  14. Snake venom. The amino acid sequence of protein A from Dendroaspis polylepis polylepis (black mamba) venom.

    PubMed

    Joubert, F J; Strydom, D J

    1980-12-01

    Protein A from Dendroaspis polylepis polylepis venom comprises 81 amino acids, including ten half-cystine residues. The complete primary structures of protein A and its variant A' were elucidated. The sequences of proteins A and A', which differ in a single position, show no homology with various neurotoxins and non-neurotoxic proteins and represent a new type of elapid venom protein. PMID:7461607

  15. Characterization of the microbial acid mine drainage microbial community using culturing and direct sequencing techniques.

    PubMed

    Auld, Ryan R; Myre, Maxine; Mykytczuk, Nadia C S; Leduc, Leo G; Merritt, Thomas J S

    2013-05-01

    We characterized the bacterial community from an AMD tailings pond using both classical culturing and modern direct sequencing techniques and compared the two methods. Acid mine drainage (AMD) is produced by the environmental and microbial oxidation of minerals dissolved from mining waste. Surprisingly, we know little about the microbial communities associated with AMD, despite the fundamental ecological roles of these organisms and large-scale economic impact of these waste sites. AMD microbial communities have classically been characterized by laboratory culturing-based techniques and more recently by direct sequencing of marker gene sequences, primarily the 16S rRNA gene. In our comparison of the techniques, we find that their results are complementary, overall indicating very similar community structure with similar dominant species, but with each method identifying some species that were missed by the other. We were able to culture the majority of species that our direct sequencing results indicated were present, primarily species within the Acidithiobacillus and Acidiphilium genera, although estimates of relative species abundance were only obtained from direct sequencing. Interestingly, our culture-based methods recovered four species that had been overlooked from our sequencing results because of the rarity of the marker gene sequences, likely members of the rare biosphere. Further, direct sequencing indicated that a single genus, completely missed in our culture-based study, Legionella, was a dominant member of the microbial community. Our results suggest that while either method does a reasonable job of identifying the dominant members of the AMD microbial community, together the methods combine to give a more complete picture of the true diversity of this environment. PMID:23485423

  16. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... approved by the Director of the Federal Register in accordance with 5 U.S.C. 552(a) and 1 CFR part 51... base or modified or unusual amino acid may be presented in a given sequence as the corresponding unmodified base or amino acid if the modified base or modified or unusual amino acid is one of those...

  17. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... approved by the Director of the Federal Register in accordance with 5 U.S.C. 552(a) and 1 CFR part 51... base or modified or unusual amino acid may be presented in a given sequence as the corresponding unmodified base or amino acid if the modified base or modified or unusual amino acid is one of those...

  18. Nanopore Analysis of Nucleic Acids: Single-Molecule Studies of Molecular Dynamics, Structure, and Base Sequence

    NASA Astrophysics Data System (ADS)

    Olasagasti, Felix; Deamer, David W.

    Nucleic acids are linear polynucleotides in which each base is covalently linked to a pentose sugar and a phosphate group carrying a negative charge. If a pore having roughly the crosssectional diameter of a single-stranded nucleic acid is embedded in a thin membrane and a voltage of 100 mV or more is applied, individual nucleic acids in solution can be captured by the electrical field in the pore and translocated through by single-molecule electrophoresis. The dimensions of the pore cannot accommodate anything larger than a single strand, so each base in the molecule passes through the pore in strict linear sequence. The nucleic acid strand occupies a large fraction of the pore's volume during translocation and therefore produces a transient blockade of the ionic current created by the applied voltage. If it could be demonstrated that each nucleotide in the polymer produced a characteristic modulation of the ionic current during its passage through the nanopore, the sequence of current modulations would reflect the sequence of bases in the polymer. According to this basic concept, nanopores are analogous to a Coulter counter that detects nanoscopic molecules rather than microscopic [1,2]. However, the advantage of nanopores is that individual macromolecules can be characterized because different chemical and physical properties affect their passage through the pore. Because macromolecules can be captured in the pore as well as translocated, the nanopore can be used to detect individual functional complexes that form between a nucleic acid and an enzyme. No other technique has this capability.

  19. Nonsense mutation in the glycoprotein Ib. alpha. coding sequence associated with Bernard-Soulier syndrome

    SciTech Connect

    Ware, J.; Russell, S.R.; Vicente, V.; Scharf, R.E.; Tomer, A.; McMillian, R.; Ruggeri, Z.M. )

    1990-03-01

    Three distinct gene products, the {alpha} and {beta} chains of glycoprotein (GP) Ib and GP IX, constitute the platelet membrane GP Ib-IX complex, a receptor for von Willebrand factor and thrombin involved in platelet adhesion and aggregation. Defective function of the GP Ib-IX complex is the hallmark of a rare congenital bleeding disorder of still undefined pathogenesis, the Bernard-Soulier syndrome. The authors have analyzed the molecular basis of the disease in one patient in whom immunoblotting of solubilized platelets demonstrated absence of normal GP Ib{alpha} but presence of a smaller immunoreactive species. The truncated polypeptide was also present, along with normal protein, in platelets from the patient's mother and two of his four children. Genetic characterization identified a nucleotide transition changing the Trp-343 codon (TGG) to a nonsense codon (TGA). Such a mutation explains the origin of the smaller GP Ib{alpha}, which by lacking half of the sequence on the carboxyl-terminal side, including the transmembrane domain, cannot be properly inserted in the platelet membrane. Both normal and mutant codons were found in the patient, suggesting that he is a compound heterozygote with a still unidentified defect in the other GP Ib{alpha} allele. Nonsense mutation and truncated GP Ib{alpha} polypeptide were found to cosegregate in four individuals through three generations and were associated with either Bernard-Soulier syndrome or carrier state phenotype. The molecular abnormality demonstrated in this family provides evidence that defective synthesis of GP Ib{alpha} alters the membrane expression of the GP Ib-IX complex and may be responsible for Bernard-Soulier syndrome.

  20. Complete amino acid sequence of a histidine-rich proteolytic fragment of human ceruloplasmin.

    PubMed

    Kingston, I B; Kingston, B L; Putnam, F W

    1979-04-01

    The complete amino acid sequence has been determined for a fragment of human ceruloplasmin [ferroxidase; iron(II):oxygen oxidoreductase, EC 1.16.3.1]. The fragment (designated Cp F5) contains 159 amino acid residues and has a molecular weight of 18,650; it lacks carbohydrate, is rich in histidine, and contains one free cysteine that may be part of a copper-binding site. This fragment is present in most commercial preparations of ceruloplasmin, probably owing to proteolytic degradation, but can also be obtained by limited cleavage of single-chain ceruloplasmin with plasmin. Cp F5 probably is an intact domain attached to the COOH-terminal end of single-chain ceruloplasmin via a labile interdomain peptide bond. A model of the secondary structure predicted by empirical methods suggests that almost one-third of the amino acid residues are distributed in alpha helices, about a third in beta-sheet structure, and the remainder in beta turns and unidentified structures. Computer analysis of the amino acid sequence has not demonstrated a statistically significant relationship between this ceruloplasmin fragment and any other protein, but there is some evidence for an internal duplication. PMID:287005

  1. Analysis of a nucleotide-binding site of 5-lipoxygenase by affinity labelling: binding characteristics and amino acid sequences.

    PubMed Central

    Zhang, Y Y; Hammarberg, T; Radmark, O; Samuelsson, B; Ng, C F; Funk, C D; Loscalzo, J

    2000-01-01

    5-Lipoxygenase (5LO) catalyses the first two steps in the biosynthesis of leukotrienes, which are inflammatory mediators derived from arachidonic acid. 5LO activity is stimulated by ATP; however, a consensus ATP-binding site or nucleotide-binding site has not been found in its protein sequence. In the present study, affinity and photoaffinity labelling of 5LO with 5'-p-fluorosulphonylbenzoyladenosine (FSBA) and 2-azido-ATP showed that 5LO bound to the ATP analogues quantitatively and specifically and that the incorporation of either analogue inhibited ATP stimulation of 5LO activity. The stoichiometry of the labelling was 1.4 mol of FSBA/mol of 5LO (of which ATP competed with 1 mol/mol) or 0.94 mol of 2-azido-ATP/mol of 5LO (of which ATP competed with 0.77 mol/mol). Labelling with FSBA prevented further labelling with 2-azido-ATP, indicating that the same binding site was occupied by both analogues. Other nucleotides (ADP, AMP, GTP, CTP and UTP) also competed with 2-azido-ATP labelling, suggesting that the site was a general nucleotide-binding site rather than a strict ATP-binding site. Ca(2+), which also stimulates 5LO activity, had no effect on the labelling of the nucleotide-binding site. Digestion with trypsin and peptide sequencing showed that two fragments of 5LO were labelled by 2-azido-ATP. These fragments correspond to residues 73-83 (KYWLNDDWYLK, in single-letter amino acid code) and 193-209 (FMHMFQSSWNDFADFEK) in the 5LO sequence. Trp-75 and Trp-201 in these peptides were modified by the labelling, suggesting that they were immediately adjacent to the C-2 position of the adenine ring of ATP. Given the stoichiometry of the labelling, the two peptide sequences of 5LO were probably near each other in the enzyme's tertiary structure, composing or surrounding the ATP-binding site of 5LO. PMID:11042125

  2. Immunoreactivity of polyclonal antibodies generated against the carboxy terminus of the predicted amino acid sequence of the Huntington disease gene

    SciTech Connect

    Alkatib, G.; Graham, R.; Pelmear-Telenius, A.

    1994-09-01

    A cDNA fragment spanning the 3{prime}-end of the Huntington disease gene (from 8052 to 9252) was cloned into a prokaryotic expression vector containing the E. Coli lac promoter and a portion of the coding sequence for {beta}-galactosidase. The truncated {beta}-galactosidase gene was cleaved with BamHl and fused in frame to the BamHl fragment of the Huntington disease gene 3{prime}-end. Expression analysis of proteins made in E. Coli revealed that 20-30% of the total cellular proteins was represented by the {beta}-galactosidase-huntingtin fusion protein. The identity of the Huntington disease protein amino acid sequences was confirmed by protein sequence analysis. Affinity chromatography was used to purify large quantities of the fusion protein from bacterial cell lysates. Affinity-purified proteins were used to immunize New Zealand white rabbits for antibody production. The generated polyclonal antibodies were used to immunoprecipitate the Huntington disease gene product expressed in a neuroblastoma cell line. In this cell line the antibodies precipitated two protein bands of apparent gel migrations of 200 and 150 kd which together, correspond to the calculated molecular weight of the Huntington disease gene product (350 kd). Immunoblotting experiments revealed the presence of a large precursor protein in the range of 350-750 kd which is in agreement with the predicted molecular weight of the protein without post-translational modifications. These results indicate that the huntingtin protein is cleaved into two subunits in this neuroblastoma cell line and implicate that cleavage of a large precursor protein may contribute to its biological activity. Experiments are ongoing to determine the precursor-product relationship and to examine the synthesis of the huntingtin protein in freshly isolated rat brains, and to determine cellular and subcellular distribution of the gene product.

  3. The amino acid sequence of Lady Amherst's pheasant (Chrysolophus amherstiae) and golden pheasant (Chrysolophus pictus) egg-white lysozymes.

    PubMed

    Araki, T; Kuramoto, M; Torikata, T

    1990-09-01

    The amino acids of Lady Amherst's pheasant and golden pheasant egg-white lysozymes have been sequenced. The carboxymethylated lysozymes were digested with trypsin followed by sequencing of the tryptic peptides. Lady Amherst's pheasant lysozyme proved to consist of 129 amino acid residues, and a relative molecular mass of 14,423 Da was calculated. This lysozyme had 6 amino acids substitutions when compared with hen egg-white lysozyme: Phe3 to Tyr, His15 to Leu, Gln41 to His, Asn77 to His, Gln 121 to Asn, and a newly found substitution of Ile124 to Thr. The amino acid sequence of golden pheasant lysozyme was identical to that of Lady Amherst's phesant lysozyme. The phylogenetic tree constructured by the comparison of amino acid sequences of phasianoid birds lysozymes revealed a minimum genetic distance between these pheasants and the turkey-peafowl group. PMID:1368578

  4. Complete Genome Sequence of a thermotolerant sporogenic lactic acid bacterium, Bacillus coagulans strain 36D1

    PubMed Central

    Rhee, Mun Su; Moritz, Brélan E.; Xie, Gary; Glavina del Rio, T.; Dalin, E.; Tice, H.; Bruce, D.; Goodwin, L.; Chertkov, O.; Brettin, T.; Han, C.; Detter, C.; Pitluck, S.; Land, Miriam L.; Patel, Milind; Ou, Mark; Harbrucker, Roberta; Ingram, Lonnie O.; Shanmugam, K. T.

    2011-01-01

    Bacillus coagulans is a ubiquitous soil bacterium that grows at 50-55 °C and pH 5.0 and ferments various sugars that constitute plant biomass to L (+)-lactic acid. The ability of this sporogenic lactic acid bacterium to grow at 50-55 °C and pH 5.0 makes this organism an attractive microbial biocatalyst for production of optically pure lactic acid at industrial scale not only from glucose derived from cellulose but also from xylose, a major constituent of hemicellulose. This bacterium is also considered as a potential probiotic. Complete genome sequence of a representative strain, B. coagulans strain 36D1, is presented and discussed. PMID:22675583

  5. Complete amino acid sequence of globin chains and biological activity of fragmented crocodile hemoglobin (Crocodylus siamensis).

    PubMed

    Srihongthong, Saowaluck; Pakdeesuwan, Anawat; Daduang, Sakda; Araki, Tomohiro; Dhiravisit, Apisak; Thammasirirak, Sompong

    2012-08-01

    Hemoglobin, α-chain, β-chain and fragmented hemoglobin of Crocodylus siamensis demonstrated both antibacterial and antioxidant activities. Antibacterial and antioxidant properties of the hemoglobin did not depend on the heme structure but could result from the compositions of amino acid residues and structures present in their primary structure. Furthermore, thirteen purified active peptides were obtained by RP-HPLC analyses, corresponding to fragments in the α-globin chain and the β-globin chain which are mostly located at the N-terminal and C-terminal parts. These active peptides operate on the bacterial cell membrane. The globin chains of Crocodylus siamensis showed similar amino acids to the sequences of Crocodylus niloticus. The novel amino acid substitutions of α-chain and β-chain are not associated with the heme binding site or the bicarbonate ion binding site, but could be important through their interactions with membranes of bacteria. PMID:22648692

  6. [Partial sequence homology of FtsZ in phylogenetics analysis of lactic acid bacteria].

    PubMed

    Zhang, Bin; Dong, Xiu-zhu

    2005-10-01

    FtsZ is a structurally conserved protein, which is universal among the prokaryotes. It plays a key role in prokaryote cell division. A partial fragment of the ftsZ gene about 800bp in length was amplified and sequenced and a partial FtsZ protein phylogenetic tree for the lactic acid bacteria was constructed. By comparing the FtsZ phylogenetic tree with the 16S rDNA tree, it was shown that the two trees were similar in topology. Both trees revealed that Pediococcus spp. were closely related with L. casei group of Lactobacillus spp. , but less related with other lactic acid cocci such as Enterococcus and Streptococcus. The results also showed that the discriminative power of FtsZ was higher than that of 16S rDNA for either inter-species or inter-genus and could be a very useful tool in species identification of lactic acid bacteria. PMID:16342751

  7. Comparative characterization of random-sequence proteins consisting of 5, 12, and 20 kinds of amino acids.

    PubMed

    Tanaka, Junko; Doi, Nobuhide; Takashima, Hideaki; Yanagawa, Hiroshi

    2010-04-01

    Screening of functional proteins from a random-sequence library has been used to evolve novel proteins in the field of evolutionary protein engineering. However, random-sequence proteins consisting of the 20 natural amino acids tend to aggregate, and the occurrence rate of functional proteins in a random-sequence library is low. From the viewpoint of the origin of life, it has been proposed that primordial proteins consisted of a limited set of amino acids that could have been abundantly formed early during chemical evolution. We have previously found that members of a random-sequence protein library constructed with five primitive amino acids show high solubility (Doi et al., Protein Eng Des Sel 2005;18:279-284). Although such a library is expected to be appropriate for finding functional proteins, the functionality may be limited, because they have no positively charged amino acid. Here, we constructed three libraries of 120-amino acid, random-sequence proteins using alphabets of 5, 12, and 20 amino acids by preselection using mRNA display (to eliminate sequences containing stop codons and frameshifts) and characterized and compared the structural properties of random-sequence proteins arbitrarily chosen from these libraries. We found that random-sequence proteins constructed with the 12-member alphabet (including five primitive amino acids and positively charged amino acids) have higher solubility than those constructed with the 20-member alphabet, though other biophysical properties are very similar in the two libraries. Thus, a library of moderate complexity constructed from 12 amino acids may be a more appropriate resource for functional screening than one constructed from 20 amino acids. PMID:20162614

  8. N-Terminal Amino Acid Sequence Determination of Proteins by N-Terminal Dimethyl Labeling: Pitfalls and Advantages When Compared with Edman Degradation Sequence Analysis.

    PubMed

    Chang, Elizabeth; Pourmal, Sergei; Zhou, Chun; Kumar, Rupesh; Teplova, Marianna; Pavletich, Nikola P; Marians, Kenneth J; Erdjument-Bromage, Hediye

    2016-07-01

    In recent history, alternative approaches to Edman sequencing have been investigated, and to this end, the Association of Biomolecular Resource Facilities (ABRF) Protein Sequencing Research Group (PSRG) initiated studies in 2014 and 2015, looking into bottom-up and top-down N-terminal (Nt) dimethyl derivatization of standard quantities of intact proteins with the aim to determine Nt sequence information. We have expanded this initiative and used low picomole amounts of myoglobin to determine the efficiency of Nt-dimethylation. Application of this approach on protein domains, generated by limited proteolysis of overexpressed proteins, confirms that it is a universal labeling technique and is very sensitive when compared with Edman sequencing. Finally, we compared Edman sequencing and Nt-dimethylation of the same polypeptide fragments; results confirm that there is agreement in the identity of the Nt amino acid sequence between these 2 methods. PMID:27006647

  9. N-Terminal Amino Acid Sequence Determination of Proteins by N-Terminal Dimethyl Labeling: Pitfalls and Advantages When Compared with Edman Degradation Sequence Analysis

    PubMed Central

    Chang, Elizabeth; Pourmal, Sergei; Zhou, Chun; Kumar, Rupesh; Teplova, Marianna; Pavletich, Nikola P.; Marians, Kenneth J.

    2016-01-01

    In recent history, alternative approaches to Edman sequencing have been investigated, and to this end, the Association of Biomolecular Resource Facilities (ABRF) Protein Sequencing Research Group (PSRG) initiated studies in 2014 and 2015, looking into bottom-up and top-down N-terminal (Nt) dimethyl derivatization of standard quantities of intact proteins with the aim to determine Nt sequence information. We have expanded this initiative and used low picomole amounts of myoglobin to determine the efficiency of Nt-dimethylation. Application of this approach on protein domains, generated by limited proteolysis of overexpressed proteins, confirms that it is a universal labeling technique and is very sensitive when compared with Edman sequencing. Finally, we compared Edman sequencing and Nt-dimethylation of the same polypeptide fragments; results confirm that there is agreement in the identity of the Nt amino acid sequence between these 2 methods. PMID:27006647

  10. Partial amino acid sequence of fructose-1,6-bisphosphatase from the blue-green algae Synechococcus leopoliensis.

    PubMed

    Marcus, F; Latshaw, S P; Steup, M; Gerbling, K P

    1989-08-01

    Purified fructose-1,6-bisphosphatase from the cyanobacterium Synechococcus leopoliensis was S-carboxymethylated and cleaved with trypsin. The resulting peptides were purified by reversed-phase high performance liquid chromatography and the amino acid sequence of six of the purified peptides was determined by gas-phase microsequencing. The results revealed sequence homology with other fructose-1,6-bisphosphatases. The obtained sequence data provides information required for the design of oligonucleotide hybridization probes to screen existing libraries of cyanobacterial DNA. The determination of the amino acid sequence of cyanobacterial proteins may yield important information with respect to the endosymbiotic theory of evolution. PMID:2550924

  11. Protein sequence analysis by incorporating modified chaos game and physicochemical properties into Chou's general pseudo amino acid composition.

    PubMed

    Xu, Chunrui; Sun, Dandan; Liu, Shenghui; Zhang, Yusen

    2016-10-01

    In this contribution we introduced a novel graphical method to compare protein sequences. By mapping a protein sequence into 3D space based on codons and physicochemical properties of 20 amino acids, we are able to get a unique P-vector from the 3D curve. This approach is consistent with wobble theory of amino acids. We compute the distance between sequences by their P-vectors to measure similarities/dissimilarities among protein sequences. Finally, we use our method to analyze four datasets and get better results compared with previous approaches. PMID:27375218

  12. Molecular cloning of the. alpha. -subunit of human prolyl 4-hydroxylase: The complete cDNA-derived amino acid sequence and evidence for alternative splicing of RNA transcripts

    SciTech Connect

    Helaakoski, T.; Vuori, K.; Myllylae, R.; Kivirikko, K.I.; Pihlajaniemi, T. )

    1989-06-01

    Prolyl 4-hydroxylase an {alpha}{sub 2}{beta}{sub 2} tetramer, catalyzes the formation of 4-hydroxyproline in collagens by the hydroxylation of proline residues in peptide linkages. The authors report here on the isolation of cDNA clones encoding the {alpha}-subunit of the enzyme from human tumor HT-1080, placenta, and fibroblast cDNA libraries. Eight overlapping clones covering almost all of the corresponding 3,000-nucleotide mRNA, including all the coding sequences, were characterized. These clones encode a polypeptide of 517 amino acid residues and a signal peptide of 17 amino acids. Previous characterization of cDNA clones for the {beta}-subunit of prolyl 4-hydroxylase has indicated that its C terminus has the amino acid sequence Lys-Asp-Gly-Leu, which, it has been suggested, is necessary for the retention of a polypeptide within the lumen of the endoplasmic reticulum. The {alpha}-subunit does not have this C-terminal sequence, and thus one function of the {beta}-subunit in the prolyl 4-hydroxylase tetramer appears to be to retain the enzyme within this cell organelle. Southern blot analyses of human genomic DNA with a cDNA probe for the {alpha}-subunit suggested the presence of only one gene encoding the two types of mRNA, which appear to result from mutually exclusive alternative splicing of primary transcripts of one gene.

  13. Nucleotide sequence of the phosphoglycerate kinase gene from the extreme thermophile Thermus thermophilus. Comparison of the deduced amino acid sequence with that of the mesophilic yeast phosphoglycerate kinase.

    PubMed Central

    Bowen, D; Littlechild, J A; Fothergill, J E; Watson, H C; Hall, L

    1988-01-01

    Using oligonucleotide probes derived from amino acid sequencing information, the structural gene for phosphoglycerate kinase from the extreme thermophile, Thermus thermophilus, was cloned in Escherichia coli and its complete nucleotide sequence determined. The gene consists of an open reading frame corresponding to a protein of 390 amino acid residues (calculated Mr 41,791) with an extreme bias for G or C (93.1%) in the codon third base position. Comparison of the deduced amino acid sequence with that of the corresponding mesophilic yeast enzyme indicated a number of significant differences. These are discussed in terms of the unusual codon bias and their possible role in enhanced protein thermal stability. Images Fig. 1. PMID:3052437

  14. Three-Dimensional Algebraic Models of the tRNA Code and 12 Graphs for Representing the Amino Acids.

    PubMed

    José, Marco V; Morgado, Eberto R; Guimarães, Romeu Cardoso; Zamudio, Gabriel S; de Farías, Sávio Torres; Bobadilla, Juan R; Sosa, Daniela

    2014-01-01

    Three-dimensional algebraic models, also called Genetic Hotels, are developed to represent the Standard Genetic Code, the Standard tRNA Code (S-tRNA-C), and the Human tRNA code (H-tRNA-C). New algebraic concepts are introduced to be able to describe these models, to wit, the generalization of the 2n-Klein Group and the concept of a subgroup coset with a tail. We found that the H-tRNA-C displayed broken symmetries in regard to the S-tRNA-C, which is highly symmetric. We also show that there are only 12 ways to represent each of the corresponding phenotypic graphs of amino acids. The averages of statistical centrality measures of the 12 graphs for each of the three codes are carried out and they are statistically compared. The phenotypic graphs of the S-tRNA-C display a common triangular prism of amino acids in 10 out of the 12 graphs, whilst the corresponding graphs for the H-tRNA-C display only two triangular prisms. The graphs exhibit disjoint clusters of amino acids when their polar requirement values are used. We contend that the S-tRNA-C is in a frozen-like state, whereas the H-tRNA-C may be in an evolving state. PMID:25370377

  15. Three-Dimensional Algebraic Models of the tRNA Code and 12 Graphs for Representing the Amino Acids

    PubMed Central

    José, Marco V.; Morgado, Eberto R.; Guimarães, Romeu Cardoso; Zamudio, Gabriel S.; de Farías, Sávio Torres; Bobadilla, Juan R.; Sosa, Daniela

    2014-01-01

    Three-dimensional algebraic models, also called Genetic Hotels, are developed to represent the Standard Genetic Code, the Standard tRNA Code (S-tRNA-C), and the Human tRNA code (H-tRNA-C). New algebraic concepts are introduced to be able to describe these models, to wit, the generalization of the 2n-Klein Group and the concept of a subgroup coset with a tail. We found that the H-tRNA-C displayed broken symmetries in regard to the S-tRNA-C, which is highly symmetric. We also show that there are only 12 ways to represent each of the corresponding phenotypic graphs of amino acids. The averages of statistical centrality measures of the 12 graphs for each of the three codes are carried out and they are statistically compared. The phenotypic graphs of the S-tRNA-C display a common triangular prism of amino acids in 10 out of the 12 graphs, whilst the corresponding graphs for the H-tRNA-C display only two triangular prisms. The graphs exhibit disjoint clusters of amino acids when their polar requirement values are used. We contend that the S-tRNA-C is in a frozen-like state, whereas the H-tRNA-C may be in an evolving state. PMID:25370377

  16. Bacteria obtained from a sequencing batch reactor that are capable of growth on dehydroabietic acid.

    PubMed Central

    Mohn, W W

    1995-01-01

    Eleven isolates capable of growth on the resin acid dehydroabietic acid (DhA) were obtained from a sequencing batch reactor designed to treat a high-strength process stream from a paper mill. The isolates belonged to two groups, represented by strains DhA-33 and DhA-35, which were characterized. In the bioreactor, bacteria like DhA-35 were more abundant than those like DhA-33. The population in the bioreactor of organisms capable of growth on DhA was estimated to be 1.1 x 10(6) propagules per ml, based on a most-probable-number determination. Analysis of small-subunit rRNA partial sequences indicated that DhA-33 was most closely related to Sphingomonas yanoikuyae (Sab = 0.875) and that DhA-35 was most closely related to Zoogloea ramigera (Sab = 0.849). Both isolates additionally grew on other abietanes, i.e., abietic and palustric acids, but not on the pimaranes, pimaric and isopimaric acids. For DhA-33 and DhA-35 with DhA as the sole organic substrate, doubling times were 2.7 and 2.2 h, respectively, and growth yields were 0.30 and 0.25 g of protein per g of DhA, respectively. Glucose as a cosubstrate stimulated growth of DhA-33 on DhA and stimulated DhA degradation by the culture. Pyruvate as a cosubstrate did not stimulate growth of DhA-35 on DhA and reduced the specific rate of DhA degradation of the culture. DhA induced DhA and abietic acid degradation activities in both strains, and these activities were heat labile. Cell suspensions of both strains consumed DhA at a rate of 6 mumol mg of protein-1 h-1.(ABSTRACT TRUNCATED AT 250 WORDS) PMID:7793937

  17. Sequencing the GRHL3 Coding Region Reveals Rare Truncating Mutations and a Common Susceptibility Variant for Nonsyndromic Cleft Palate.

    PubMed

    Mangold, Elisabeth; Böhmer, Anne C; Ishorst, Nina; Hoebel, Ann-Kathrin; Gültepe, Pinar; Schuenke, Hannah; Klamt, Johanna; Hofmann, Andrea; Gölz, Lina; Raff, Ruth; Tessmann, Peter; Nowak, Stefanie; Reutter, Heiko; Hemprich, Alexander; Kreusch, Thomas; Kramer, Franz-Josef; Braumann, Bert; Reich, Rudolf; Schmidt, Gül; Jäger, Andreas; Reiter, Rudolf; Brosch, Sibylle; Stavusis, Janis; Ishida, Miho; Seselgyte, Rimante; Moore, Gudrun E; Nöthen, Markus M; Borck, Guntram; Aldhorae, Khalid A; Lace, Baiba; Stanier, Philip; Knapp, Michael; Ludwig, Kerstin U

    2016-04-01

    Nonsyndromic cleft lip with/without cleft palate (nsCL/P) and nonsyndromic cleft palate only (nsCPO) are the most frequent subphenotypes of orofacial clefts. A common syndromic form of orofacial clefting is Van der Woude syndrome (VWS) where individuals have CL/P or CPO, often but not always associated with lower lip pits. Recently, ∼5% of VWS-affected individuals were identified with mutations in the grainy head-like 3 gene (GRHL3). To investigate GRHL3 in nonsyndromic clefting, we sequenced its coding region in 576 Europeans with nsCL/P and 96 with nsCPO. Most strikingly, nsCPO-affected individuals had a higher minor allele frequency for rs41268753 (0.099) than control subjects (0.049; p = 1.24 × 10(-2)). This association was replicated in nsCPO/control cohorts from Latvia, Yemen, and the UK (pcombined = 2.63 × 10(-5); ORallelic = 2.46 [95% CI 1.6-3.7]) and reached genome-wide significance in combination with imputed data from a GWAS in nsCPO triads (p = 2.73 × 10(-9)). Notably, rs41268753 is not associated with nsCL/P (p = 0.45). rs41268753 encodes the highly conserved p.Thr454Met (c.1361C>T) (GERP = 5.3), which prediction programs denote as deleterious, has a CADD score of 29.6, and increases protein binding capacity in silico. Sequencing also revealed four novel truncating GRHL3 mutations including two that were de novo in four families, where all nine individuals harboring mutations had nsCPO. This is important for genetic counseling: given that VWS is rare compared to nsCPO, our data suggest that dominant GRHL3 mutations are more likely to cause nonsyndromic than syndromic CPO. Thus, with rare dominant mutations and a common risk variant in the coding region, we have identified an important contribution for GRHL3 in nsCPO. PMID:27018475

  18. Novel method for PIK3CA mutation analysis: locked nucleic acid--PCR sequencing.

    PubMed

    Ang, Daphne; O'Gara, Rebecca; Schilling, Amy; Beadling, Carol; Warrick, Andrea; Troxell, Megan L; Corless, Christopher L

    2013-05-01

    Somatic mutations in PIK3CA are commonly seen in invasive breast cancer and several other carcinomas, occurring in three hotspots: codons 542 and 545 of exon 9 and in codon 1047 of exon 20. We designed a locked nucleic acid (LNA)-PCR sequencing assay to detect low levels of mutant PIK3CA DNA with attention to avoiding amplification of a pseudogene on chromosome 22 that has >95% homology to exon 9 of PIK3CA. We tested 60 FFPE breast DNA samples with known PIK3CA mutation status (48 cases had one or more PIK3CA mutations, and 12 were wild type) as identified by PCR-mass spectrometry. PIK3CA exons 9 and 20 were amplified in the presence or absence of LNA-oligonucleotides designed to bind to the wild-type sequences for codons 542, 545, and 1047, and partially suppress their amplification. LNA-PCR sequencing confirmed all 51 PIK3CA mutations; however, the mutation detection rate by standard Sanger sequencing was only 69% (35 of 51). Of the 12 PIK3CA wild-type cases, LNA-PCR sequencing detected three additional H1047R mutations in "normal" breast tissue and one E545K in usual ductal hyperplasia. Histopathological review of these three normal breast specimens showed columnar cell change in two (both with known H1047R mutations) and apocrine metaplasia in one. The novel LNA-PCR shows higher sensitivity than standard Sanger sequencing and did not amplify the known pseudogene. PMID:23541593

  19. Bile acid sulfotransferase I from rat liver sulfates bile acids and 3-hydroxy steroids: purification, N-terminal amino acid sequence, and kinetic properties.

    PubMed

    Barnes, S; Buchina, E S; King, R J; McBurnett, T; Taylor, K B

    1989-04-01

    A bile acid:3'phosphoadenosine-5'phosphosulfate:sulfotransferase (BAST I) from adult female rat liver cytosol has been purified 157-fold by a two-step isolation procedure. The N-terminal amino acid sequence of the 30,000 subunit has been determined for the first 35 residues. The Vmax of purified BAST I is 18.7 nmol/min per mg protein with N-(3-hydroxy-5 beta-cholanoyl)glycine (glycolithocholic acid) as substrate, comparable to that of the corresponding purified human BAST (Chen, L-J., and I. H. Segel, 1985. Arch. Biochem. Biophys. 241: 371-379). BAST I activity has a broad pH optimum from 5.5-7.5. Although maximum activity occurs with 5 mM MgCl2, Mg2+ is not essential for BAST I activity. The greatest sulfotransferase activity and the highest substrate affinity is observed with bile acids or steroids that have a steroid nucleus containing a 3 beta-hydroxy group and a 5-6 double bond or a trans A-B ring junction. These substrates have normal hyperbolic initial velocity curves with substrate inhibition occurring above 5 microM. Of the saturated 5 beta-bile acids, those with a single 3-hydroxy group are the most active. The addition of a second hydroxy group at the 6- or 7-position eliminates more than 99% of the activity. In contrast, 3 alpha,12 alpha-dihydroxy-5 beta-cholan-24-oic acid (deoxycholic acid) is an excellent substrate. The initial velocity curves for glycolithocholic and deoxycholic acid conjugates are sigmoidal rather than hyperbolic, suggestive of an allosteric effect. Maximum activity is observed at 80 microM for glycolithocholic acid. All substrates, bile acids and steroids, are inhibited by the 5 beta-bile acid, 3-keto-5 beta-cholanoic acid. The data suggest that BAST I is the same protein as hydrosteroid sulfotransferase 2 (Marcus, C. J., et al. 1980. Anal. Biochem. 107: 296-304). PMID:2754334

  20. Sequence-defined bioactive macrocycles via an acid-catalysed cascade reaction

    NASA Astrophysics Data System (ADS)

    Porel, Mintu; Thornlow, Dana N.; Phan, Ngoc N.; Alabi, Christopher A.

    2016-06-01

    Synthetic macrocycles derived from sequence-defined oligomers are a unique structural class whose ring size, sequence and structure can be tuned via precise organization of the primary sequence. Similar to peptides and other peptidomimetics, these well-defined synthetic macromolecules become pharmacologically relevant when bioactive side chains are incorporated into their primary sequence. In this article, we report the synthesis of oligothioetheramide (oligoTEA) macrocycles via a one-pot acid-catalysed cascade reaction. The versatility of the cyclization chemistry and modularity of the assembly process was demonstrated via the synthesis of >20 diverse oligoTEA macrocycles. Structural characterization via NMR spectroscopy revealed the presence of conformational isomers, which enabled the determination of local chain dynamics within the macromolecular structure. Finally, we demonstrate the biological activity of oligoTEA macrocycles designed to mimic facially amphiphilic antimicrobial peptides. The preliminary results indicate that macrocyclic oligoTEAs with just two-to-three cationic charge centres can elicit potent antibacterial activity against Gram-positive and Gram-negative bacteria.

  1. Unconventional amino acid sequence of the sun anemone (Stoichactis helianthus) polypeptide neurotoxin

    SciTech Connect

    Kem, W.; Dunn, B.; Parten, B.; Pennington, M.; Price, D.

    1986-05-01

    A 5000 dalton polypeptide neurotoxin (Sh-NI) purified by G50 Sephadex, P-cellulose, and SP-Sephadex chromatography was homogeneous by isoelectric focusing. Sh-NI was highly toxic to crayfish (LD/sub 50/ 0.6 ..mu..g/kg) but without effect upon mice at 15,000 ..mu..g/kg (i.p. injection). The reduced, /sup 3/H-carboxymethylated toxin and its fragments were subjected to automatic Edman degradation and the resulting PTH-amino acids were identified by HPLC, back hydrolysis, and scintillation counting. Peptides resulting from proteolytic (clostripain, staphylococcal protease) and chemical (tryptophan) cleavage were sequenced. The sequence is: AACKCDDEGPDIRTAPLTGTVDLGSCNAGWEKCASYYTIIADCCRKKK. This sequence differs considerably from the homologous Anemonia and Anthopleura toxins; many of the identical residues (6 half-cystines, G9, P10, R13, G19, G29, W30) are probably critical for folding rather than receptor recognition. However, the Sh-NI sequence closely resembles Radioanthus macrodactylus neurotoxin III and r. paumotensis II. The authors propose that Sh-NI and related Radioanthus toxins act upon a different site on the sodium channel.

  2. Repeat sequence chromosome specific nucleic acid probes and methods of preparing and using

    DOEpatents

    Weier, H.U.G.; Gray, J.W.

    1995-06-27

    A primer directed DNA amplification method to isolate efficiently chromosome-specific repeated DNA wherein degenerate oligonucleotide primers are used is disclosed. The probes produced are a heterogeneous mixture that can be used with blocking DNA as a chromosome-specific staining reagent, and/or the elements of the mixture can be screened for high specificity, size and/or high degree of repetition among other parameters. The degenerate primers are sets of primers that vary in sequence but are substantially complementary to highly repeated nucleic acid sequences, preferably clustered within the template DNA, for example, pericentromeric alpha satellite repeat sequences. The template DNA is preferably chromosome-specific. Exemplary primers and probes are disclosed. The probes of this invention can be used to determine the number of chromosomes of a specific type in metaphase spreads, in germ line and/or somatic cell interphase nuclei, micronuclei and/or in tissue sections. Also provided is a method to select arbitrarily repeat sequence probes that can be screened for chromosome-specificity. 18 figs.

  3. Repeat sequence chromosome specific nucleic acid probes and methods of preparing and using

    DOEpatents

    Weier, Heinz-Ulrich G.; Gray, Joe W.

    1995-01-01

    A primer directed DNA amplification method to isolate efficiently chromosome-specific repeated DNA wherein degenerate oligonucleotide primers are used is disclosed. The probes produced are a heterogeneous mixture that can be used with blocking DNA as a chromosome-specific staining reagent, and/or the elements of the mixture can be screened for high specificity, size and/or high degree of repetition among other parameters. The degenerate primers are sets of primers that vary in sequence but are substantially complementary to highly repeated nucleic acid sequences, preferably clustered within the template DNA, for example, pericentromeric alpha satellite repeat sequences. The template DNA is preferably chromosome-specific. Exemplary primers ard probes are disclosed. The probes of this invention can be used to determine the number of chromosomes of a specific type in metaphase spreads, in germ line and/or somatic cell interphase nuclei, micronuclei and/or in tissue sections. Also provided is a method to select arbitrarily repeat sequence probes that can be screened for chromosome-specificity.

  4. Detection of Nucleic Acids with Graphene Nanopores: Ab Initio Characterization of a Novel Sequencing Device

    NASA Astrophysics Data System (ADS)

    Nelson, Tammie; Zhang, Bo; Prezhdo, Oleg

    2010-03-01

    We report an ab initio study of the interaction of two nucleobases, cytosine and adenine, with a novel graphene nanopore device for detecting the base sequence of a single-stranded nucleic acid (ssDNA or RNA). The nucleobases were inserted into a pore in a graphene nanoribbon, and the electrical current and conductance spectra were calculated as functions of voltage applied across the nanoribbon. The conductance spectra and charge densities were analyzed in the presence of each nucleobase in the graphene nanopore. The results indicate that, due to significant differences in the conductance spectra, the proposed device has adequate sensitivity to discriminate between different nucleotides. Moreover, we show that the nucleotide conductance spectra is not affected by its orientation inside the graphene nanopore. The proposed technique may be extremely useful for real applications in developing ultrafast, low cost DNA sequencing methods.

  5. Morphological tranformation of calcite crystal growth by prismatic "acidic" polypeptide sequences.

    SciTech Connect

    Kim, I; Giocondi, J L; Orme, C A; Collino, J; Evans, J S

    2007-02-13

    Many of the interesting mechanical and materials properties of the mollusk shell are thought to stem from the prismatic calcite crystal assemblies within this composite structure. It is now evident that proteins play a major role in the formation of these assemblies. Recently, a superfamily of 7 conserved prismatic layer-specific mollusk shell proteins, Asprich, were sequenced, and the 42 AA C-terminal sequence region of this protein superfamily was found to introduce surface voids or porosities on calcite crystals in vitro. Using AFM imaging techniques, we further investigate the effect that this 42 AA domain (Fragment-2) and its constituent subdomains, DEAD-17 and Acidic-2, have on the morphology and growth kinetics of calcite dislocation hillocks. We find that Fragment-2 adsorbs on terrace surfaces and pins acute steps, accelerates then decelerates the growth of obtuse steps, forms clusters and voids on terrace surfaces, and transforms calcite hillock morphology from a rhombohedral form to a rounded one. These results mirror yet are distinct from some of the earlier findings obtained for nacreous polypeptides. The subdomains Acidic-2 and DEAD-17 were found to accelerate then decelerate obtuse steps and induce oval rather than rounded hillock morphologies. Unlike DEAD-17, Acidic-2 does form clusters on terrace surfaces and exhibits stronger obtuse velocity inhibition effects than either DEAD-17 or Fragment-2. Interestingly, a 1:1 mixture of both subdomains induces an irregular polygonal morphology to hillocks, and exhibits the highest degree of acute step pinning and obtuse step velocity inhibition. This suggests that there is some interplay between subdomains within an intra (Fragment-2) or intermolecular (1:1 mixture) context, and sequence interplay phenomena may be employed by biomineralization proteins to exert net effects on crystal growth and morphology.

  6. DNA polymorphism in morels: complete sequences of the internal transcribed spacer of genes coding for rRNA in Morchella esculenta (yellow morel) and Morchella conica (black morel).

    PubMed Central

    Wipf, D; Munch, J C; Botton, B; Buscot, F

    1996-01-01

    The internal transcribed spacer (ITS) of the gene coding for rRNA was sequenced in both directions with the gene walking technique in a black morel (Morchella conica) and a yellow morel (M. esculenta) to elucidate the ITS length discrepancy between the two species groups (750-bp ITS in black morels and 1,150-bp ITS in yellow morels. PMID:8795250

  7. Full-length coding sequence for 12 bovine viral diarrhea virus isolates from persistently infected cattle in a feedyard in Kansas

    Technology Transfer Automated Retrieval System (TEKTRAN)

    We report here the full-length coding sequence of 12 bovine viral diarrhea virus (BVDV) isolates from persistently infected cattle from a feedyard in southwest Kansas, USA. These 12 genomes represent the three major genotypes (BVDV 1a, 1b, and 2a) of BVDV currently circulating in the United States....

  8. Association of low-frequency and rare coding-sequence variants with blood lipids and Coronary Heart Disease in 56,000 whites and blacks

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Low-frequency coding DNA sequence variants in the proprotein convertase subtilisin/kexin type 9 gene (PCSK9) lower plasma low-density lipoprotein cholesterol (LDL-C), protect against risk of coronary heart disease (CHD), and have prompted the development of a new class of therapeutics. It is uncerta...

  9. Moving Away from the Reference Genome: Evaluating a Peptide Sequencing Tagging Approach for Single Amino Acid Polymorphism Identifications in the Genus Populus

    SciTech Connect

    Abraham, Paul E; Adams, Rachel M; Tuskan, Gerald A; Hettich, Robert {Bob} L

    2013-01-01

    The genetic diversity across natural populations of the model organism, Populus, is extensive, containing a single nucleotide polymorphism roughly every 200 base pairs. When deviations from the reference genome occur in coding regions, they can impact protein sequences. Rather than relying on a static reference database to profile protein expression, we employed a peptide sequence tagging (PST) approach capable of decoding the plasticity of the Populus proteome. Using shotgun proteomics data from two genotypes of P. trichocarpa, a tag-based approach enabled the detection of 6,653 unexpected sequence variants. Through manual validation, our study investigated how the most abundant chemical modification (methionine oxidation) could masquerade as a sequence variant (AlaSer) when few site-determining ions existed. In fact, precise localization of an oxidation site for peptides with more than one potential placement was indeterminate for 70% of the MS/MS spectra. We demonstrate that additional fragment ions made available by high energy collisional dissociation enhances the robustness of the peptide sequence tagging approach (81% of oxidation events could be exclusively localized to a methionine). We are confident that augmenting fragmentation processes for a PST approach will further improve the identification of single amino acid polymorphism in Populus and potentially other species as well.

  10. DNA as a Binary Code: How the Physical Structure of Nucleotide Bases Carries Information

    ERIC Educational Resources Information Center

    McCallister, Gary

    2005-01-01

    The DNA triplet code also functions as a binary code. Because double-ring compounds cannot bind to double-ring compounds in the DNA code, the sequence of bases classified simply as purines or pyrimidines can encode for smaller groups of possible amino acids. This is an intuitive approach to teaching the DNA code. (Contains 6 figures.)

  11. Fast computational methods for predicting protein structure from primary amino acid sequence

    DOEpatents

    Agarwal, Pratul Kumar

    2011-07-19

    The present invention provides a method utilizing primary amino acid sequence of a protein, energy minimization, molecular dynamics and protein vibrational modes to predict three-dimensional structure of a protein. The present invention also determines possible intermediates in the protein folding pathway. The present invention has important applications to the design of novel drugs as well as protein engineering. The present invention predicts the three-dimensional structure of a protein independent of size of the protein, overcoming a significant limitation in the prior art.

  12. Sequence of the intron/exon junctions of the coding region of the human androgen receptor gene and identification of a point mutation in a family with complete androgen insensitivity

    SciTech Connect

    Lubahn, D.B.; Simental, J.A.; Higgs, H.N.; Wilson, E.M.; French, F.S. ); Brown, T.R.; Migeon, C.J. )

    1989-12-01

    Androgens act through a receptor protein (AR) to mediate sex differentiation and development of the male phenotype. The authors have isolated the eight exons in the amino acid coding region of the AR gene from a human X chromosome library. Nucleotide sequences of the AR gene intron/exon boundaries were determined for use in designing synthetic oligonucleotide primers to bracket coding exons for amplification by the polymerase chain reaction. Genomic DNA was amplified from 46, XY phenotypic female siblings with complete androgen insensitivity syndrome. AR binding affinity for dihydrotestosterone in the affected siblings was lower than in normal males, but the binding capacity was normal. Sequence analysis of amplified exons demonstrated within the AR steroid-binding domain (exon G) a single guanine to adenine mutation, resulting in replacement of valine with methionine at amino acid residue 866. As expected, the carrier mother had both normal and mutant AR genes. Thus, a single point mutation in the steroid-binding domain of the AR gene correlated with the expression of an AR protein ineffective in stimulating male sexual development.

  13. Population Genomic Analysis of 962 Whole Genome Sequences of Humans Reveals Natural Selection in Non-Coding Regions

    PubMed Central

    Gazave, Elodie; Chang, Diana; Raj, Srilakshmi; Hunter-Zinck, Haley; Blekhman, Ran; Arbiza, Leonardo; Van Hout, Cris; Morrison, Alanna; Johnson, Andrew D.; Bis, Joshua; Cupples, L. Adrienne; Psaty, Bruce M.; Muzny, Donna; Yu, Jin; Gibbs, Richard A.; Keinan, Alon; Clark, Andrew G.; Boerwinkle, Eric

    2015-01-01

    Whole genome analysis in large samples from a single population is needed to provide adequate power to assess relative strengths of natural selection across different functional components of the genome. In this study, we analyzed next-generation sequencing data from 962 European Americans, and found that as expected approximately 60% of the top 1% of positive selection signals lie in intergenic regions, 33% in intronic regions, and slightly over 1% in coding regions. Several detailed functional annotation categories in intergenic regions showed statistically significant enrichment in positively selected loci when compared to the null distribution of the genomic span of ENCODE categories. There was a significant enrichment of purifying selection signals detected in enhancers, transcription factor binding sites, microRNAs and target sites, but not on lincRNA or piRNAs, suggesting different evolutionary constraints for these domains. Loci in “repressed or low activity regions” and loci near or overlapping the transcription start site were the most significantly over-represented annotations among the top 1% of signals for positive selection. PMID:25807536

  14. Complete mitogenome sequences of four flatfishes (Pleuronectiformes) reveal a novel gene arrangement of L-strand coding genes

    PubMed Central

    2013-01-01

    Background Few mitochondrial gene rearrangements are found in vertebrates and large-scale changes in these genomes occur even less frequently. It is difficult, therefore, to propose a mechanism to account for observed changes in mitogenome structure. Mitochondrial gene rearrangements are usually explained by the recombination model or tandem duplication and random loss model. Results In this study, the complete mitochondrial genomes of four flatfishes, Crossorhombus azureus (blue flounder), Grammatobothus krempfi, Pleuronichthys cornutus, and Platichthys stellatus were determined. A striking finding is that eight genes in the C. azureus mitogenome are located in a novel position, differing from that of available vertebrate mitogenomes. Specifically, the ND6 and seven tRNA genes (the Q, A, C, Y, S1, E, P genes) encoded by the L-strand have been translocated to a position between tRNA-T and tRNA-F though the original order of the genes is maintained. Conclusions These special features are used to suggest a mechanism for C. azureus mitogenome rearrangement. First, a dimeric molecule was formed by two monomers linked head-to-tail, then one of the two sets of promoters lost function and the genes controlled by the disabled promoters became pseudogenes, non-coding sequences, and even were lost from the genome. This study provides a new gene-rearrangement model that accounts for the events of gene-rearrangement in a vertebrate mitogenome. PMID:23962312

  15. Amino-terminal amino acid sequence of the major structural polypeptides of avian retroviruses: sequence homology between reticuloendotheliosis virus p30 and p30s of mammalian retroviruses.

    PubMed Central

    Hunter, E; Bhown, A S; Bennett, J C

    1978-01-01

    The major structural polypeptides, p30 of reticuloendotheliosis virus (REV) (strain T) and p27 of avian sarcoma virus B77, have been compared with regard to amino acid composition. NH2-terminal amino acid sequence, and immunological crossreactions. The amino acid composition of the two polypeptides is distinct, and a comparison of the first 30 NH2-terminal amino acids of REV p30 with that for the first 25 of B77 p27 yields only three homologous residues. In competition radioimmunoassays the polypeptides show no crossreactivity. A comparison of the amino acid composition and NH2-terminal amino acid sequence of REV p30 with those reported for several mammalian retrovirus p30s shows remarkable similarities. Both REV and mammalian p30s contain a large number of polar residues in their amino acid composition and show approximately 40% homology in the first 30 NH2-terminal amino acids. No crossreactivity could be observed, however, in competition radioimmunoassays between Rauscher murine leukemia virus p30 and that of REV. The observations reported here suggest a close evolutionary relationship between REV and the mammalian retroviruses. Images PMID:208072

  16. A working hypothesis on the interdependent genesis of nucleotide bases, protein amino acids, and primitive genetic code

    NASA Astrophysics Data System (ADS)

    Egami, Fujio

    1981-09-01

    In the course of experimental approach to the chemical evolution in the primeval sea, we have found that the main products from formaldehyde and hydroxylamine are glycine, alanine, serine, aspartic acid etc., and the products from glycine and formaldehyde are serine and aspartic acid. Guanine is found in the two-letter genetic codons of all these amino acids. Based upon the finding and taking into consideration the probable synthetic pathways of nucleotide bases and protein amino acids in the course of chemical evolution and a correlation between the two-letter codons and the number of carbon atoms in the carbon skeleton of amino acids, 1 have been led to a working hypothesis on the interdependent genesis of nucleotide bases, protein amino acids, and primitive genetic code as shown in Table I. Protein amino acids can be classified into two groups: Purine Group amino acids and Pyrimidine Group amino acids. Purine bases and Pyrimidine bases are predominant in two-letter codons of amino acids belonging to the former and the latter group respectively. Guanine, adenine, and amino acids of the Purine Group may be regarded as synthesized from C1 and C2 compounds and N1 compounds (including C1N1 compunds such as HCN), probably through glycine, in the early stage of chemical evolution. Uracil, cytosine, and amino acids of the Pyrimidine Group may be regarded as synthesized directly or indirectly from three-carbon chain compounds. This synthesis became possible after the accumulation of three-carbon chain compounds and their derivatives in the primeval sea. The Purine Group can be further classified into a Guanine or (Gly+nC1) Subgroup and an Adenine or (Gly+nC2) Subgroup or simply nC2 Subgroup. The Pyrimidine Group can be further classified into a Uracil or C3C6C9 Subgroup and a Cytosine or C5-chain Subgroup (Table I). It is suggested that the primitive genetic code was established by a specific interaction between amino acids and their respective nucleotide bases. The

  17. Purification and amino acid sequence of aminopeptidase P from pig kidney.

    PubMed

    Vergas Romero, C; Neudorfer, I; Mann, K; Schäfer, W

    1995-04-01

    Aminopeptidase P from kidney cortex was purified in high yield (recovery greater than or equal to 20%) by a series of column chromatographic steps after solubilization of the membrane-bound glycoprotein with n-butanol. A coupled enzymic assay, using Gly-Pro-Pro-NH-Nap as substrate and dipeptidyl-peptidase IV as auxilliary enzyme, was used to monitor the purification. The purification procedure yielded two forms of aminopeptidase P differing in their carbohydrate composition (glycoforms). Both enzyme preparations were homogeneous as assessed by SDS/PAGE silver staining, and isoelectric focusing. Both forms possessed the same substrate specificity, catalysed the same reaction, and consisted of identical protein chains. The amino acid sequence determined by Edman degradation and mass spectrometry consisted of 623 amino acids. Six N-glycosylation sites, all contained in the N-terminal half of the protein, were characterized. PMID:7744038

  18. Sequence Evaluation of FGF and FGFR Gene Conserved Non-Coding Elements in Non-Syndromic Cleft Lip and Palate Cases

    PubMed Central

    Riley, Bridget M.; Murray, Jeffrey C.

    2009-01-01

    Non-syndromic cleft lip and palate (NS CLP) is a complex birth defect resulting from multiple genetic and environmental factors. We have previously reported the sequencing of the coding region of genes in the fibroblast growth factor (FGF) signaling pathway, in which missense and non-sense mutations contribute to approximately 5%–6% NS CLP cases. In this article we report the sequencing of conserved non-coding elements (CNEs) in and around 11 of the FGF and FGFR genes, which identified 55 novel variants. Seven of variants are highly conserved among ≥8 species and 31 variants alter transcription factor binding sites, 8 of which are important for craniofacial development. Additionally, 15 NS CLP patients had a combination of coding mutations and CNE variants, suggesting that an accumulation of variants in the FGF signaling pathway may contribute to clefting. PMID:17963255

  19. The sequence of sequencers: The history of sequencing DNA

    PubMed Central

    Heather, James M.; Chain, Benjamin

    2016-01-01

    Determining the order of nucleic acid residues in biological samples is an integral component of a wide variety of research applications. Over the last fifty years large numbers of researchers have applied themselves to the production of techniques and technologies to facilitate this feat, sequencing DNA and RNA molecules. This time-scale has witnessed tremendous changes, moving from sequencing short oligonucleotides to millions of bases, from struggling towards the deduction of the coding sequence of a single gene to rapid and widely available whole genome sequencing. This article traverses those years, iterating through the different generations of sequencing technology, highlighting some of the key discoveries, researchers, and sequences along the way. PMID:26554401

  20. Draft Genome Sequence of Cupriavidus sp. Strain SK-3, a 4-Chlorobiphenyl- and 4-Clorobenzoic Acid-Degrading Bacterium

    PubMed Central

    Vilo, Claudia; Benedik, Michael J.; Ilori, Matthew

    2014-01-01

    We report the draft genome sequence of Cupriavidus sp. strain SK-3, which can use 4-chlorobiphenyl and 4-clorobenzoic acid as the sole carbon source for growth. The draft genome sequence allowed the study of the polychlorinated biphenyl degradation mechanism and the recharacterization of the strain SK-3 as a Cupriavidus species. PMID:24994805

  1. Draft Genome Sequence of Bacillus subtilis subsp. natto Strain CGMCC 2108, a High Producer of Poly-γ-Glutamic Acid

    PubMed Central

    Tan, Siyuan; Su, Anping; Zhang, Chen; Ren, Yuanyuan

    2016-01-01

    Here, we report the 4.1-Mb draft genome sequence of Bacillus subtilis subsp. natto strain CGMCC 2108, a high producer of poly-γ-glutamic acid (γ-PGA). This sequence will provide further help for the biosynthesis of γ-PGA and will greatly facilitate research efforts in metabolic engineering of B. subtilis subsp. natto strain CGMCC 2108. PMID:27231363

  2. New monoclonal antibodies to the Ebola virus glycoprotein: Identification and analysis of the amino acid sequence of the variable domains.

    PubMed

    Panina, A A; Aliev, T K; Shemchukova, O B; Dement'yeva, I G; Varlamov, N E; Pozdnyakova, L P; Bokov, M N; Dolgikh, D A; Sveshnikov, P G; Kirpichnikov, M P

    2016-03-01

    We determined the nucleotide and amino acid sequences of variable domains of three new monoclonal antibodies to the glycoprotein of Ebola virus capsid. The framework and hypervariable regions of immunoglobulin heavy and light chains were identified. The primary structures were confirmed using massspectrometry analysis. Immunoglobulin database search showed the uniqueness of the sequences obtained. PMID:27193713

  3. Genome Sequence of the Lactic Acid Bacterium Lactococcus lactis subsp. lactis TOMSC161, Isolated from a Nonscalded Curd Pressed Cheese

    PubMed Central

    Velly, H.; Abraham, A.-L.; Loux, V.; Delacroix-Buchet, A.; Fonseca, F.; Bouix, M.

    2014-01-01

    Lactococcus lactis is a lactic acid bacterium used in the production of many fermented foods, such as dairy products. Here, we report the genome sequence of L. lactis subsp. lactis TOMSC161, isolated from nonscalded curd pressed cheese. This genome sequence provides information in relation to dairy environment adaptation. PMID:25377704

  4. Draft Genome Sequence of Bacillus subtilis subsp. natto Strain CGMCC 2108, a High Producer of Poly-γ-Glutamic Acid.

    PubMed

    Tan, Siyuan; Meng, Yonghong; Su, Anping; Zhang, Chen; Ren, Yuanyuan

    2016-01-01

    Here, we report the 4.1-Mb draft genome sequence of Bacillus subtilis subsp. natto strain CGMCC 2108, a high producer of poly-γ-glutamic acid (γ-PGA). This sequence will provide further help for the biosynthesis of γ-PGA and will greatly facilitate research efforts in metabolic engineering of B. subtilis subsp. natto strain CGMCC 2108. PMID:27231363

  5. Formation Sequences of Iron Minerals in the Acidic Alteration Products and Variation of Hydrothermal Fluid Conditions

    NASA Astrophysics Data System (ADS)

    Isobe, H.; Yoshizawa, M.

    2008-12-01

    Iron minerals have important role in environmental issues not only on the Earth but also other terrestrial planets. Iron mineral species related to alteration products of primary minerals with surface or subsurface fluids are characterized by temperature, acidity and redox conditions of the fluids. We can see various iron- bearing alteration products in alteration products around fumaroles in geothermal/volcanic areas. In this study, zonal structures of iron minerals in alteration products of the geothermal area are observed to elucidate temporal and spatial variation of hydrothermal fluids. Alteration of the pyroxene-amphibole andesite of Garan-dake volcano, Oita, Japan occurs by the acidic hydrothermal fluid to form cristobalite leaching out elements other than Si. Hand specimens with unaltered or weakly altered core and cristobalite crust show various sequences of layers. XRD analysis revealed that the alteration degree is represented by abundance of cristobalite. Intermediately altered layers are characterized by occurrence including alunite, pyrite, kaolinite, goethite and hematite. A specimen with reddish brown core surrounded by cristobalite-rich white crust has brown colored layers at the boundary of core and the crust. Reddish core is characterized by occurrence of crystalline hematite by XRD. Another hand specimen has light gray core, which represents reduced conditions, and white cristobalite crust with light brown and reddish brown layers of ferric iron minerals between the core and the crust. On the other hand, hornblende crystals, typical ferrous iron-bearing mineral of the host rock, are well preserved in some samples with strongly decolorized cristobalite-rich groundmass. Hydrothermal alteration experiments of iron-rich basaltic material shows iron mineral species depend on acidity and temperature of the fluid. Oxidation states of the iron-bearing mineral species are strongly influenced by the acidity and redox conditions. Variations of alteration

  6. Computer programs for the characterization of protein coding genes.

    PubMed

    Pierno, G; Barni, N; Candurro, M; Cipollaro, M; Franzè, A; Juliano, L; Macchiato, M F; Mastrocinque, G; Moscatelli, C; Scarlato, V

    1984-01-11

    Computer programs, implemented on an Univac II00/80 computer system, for the identification and characterization of protein coding genes and for the analysis of nucleic acid sequences, are described. PMID:6546420

  7. Computer programs for the characterization of protein coding genes.

    PubMed Central

    Pierno, G; Barni, N; Candurro, M; Cipollaro, M; Franzè, A; Juliano, L; Macchiato, M F; Mastrocinque, G; Moscatelli, C; Scarlato, V

    1984-01-01

    Computer programs, implemented on an Univac II00/80 computer system, for the identification and characterization of protein coding genes and for the analysis of nucleic acid sequences, are described. PMID:6546420

  8. Multiple Amino Acid Sequence Alignment Nitrogenase Component 1: Insights into Phylogenetics and Structure-Function Relationships

    PubMed Central

    Howard, James B.; Kechris, Katerina J.; Rees, Douglas C.; Glazer, Alexander N.

    2013-01-01

    Amino acid residues critical for a protein's structure-function are retained by natural selection and these residues are identified by the level of variance in co-aligned homologous protein sequences. The relevant residues in the nitrogen fixation Component 1 α- and β-subunits were identified by the alignment of 95 protein sequences. Proteins were included from species encompassing multiple microbial phyla and diverse ecological niches as well as the nitrogen fixation genotypes, anf, nif, and vnf, which encode proteins associated with cofactors differing at one metal site. After adjusting for differences in sequence length, insertions, and deletions, the remaining >85% of the sequence co-aligned the subunits from the three genotypes. Six Groups, designated Anf, Vnf , and Nif I-IV, were assigned based upon genetic origin, sequence adjustments, and conserved residues. Both subunits subdivided into the same groups. Invariant and single variant residues were identified and were defined as “core” for nitrogenase function. Three species in Group Nif-III, Candidatus Desulforudis audaxviator, Desulfotomaculum kuznetsovii, and Thermodesulfatator indicus, were found to have a seleno-cysteine that replaces one cysteinyl ligand of the 8Fe:7S, P-cluster. Subsets of invariant residues, limited to individual groups, were identified; these unique residues help identify the gene of origin (anf, nif, or vnf) yet should not be considered diagnostic of the metal content of associated cofactors. Fourteen of the 19 residues that compose the cofactor pocket are invariant or single variant; the other five residues are highly variable but do not correlate with the putative metal content of the cofactor. The variable residues are clustered on one side of the cofactor, away from other functional centers in the three dimensional structure. Many of the invariant and single variant residues were not previously recognized as potentially critical and their identification provides the bases

  9. Hfq assists small RNAs in binding to the coding sequence of ompD mRNA and in rearranging its structure

    PubMed Central

    Wroblewska, Zuzanna; Olejniczak, Mikolaj

    2016-01-01

    The bacterial protein Hfq participates in the regulation of translation by small noncoding RNAs (sRNAs). Several mechanisms have been proposed to explain the role of Hfq in the regulation by sRNAs binding to the 5′-untranslated mRNA regions. However, it remains unknown how Hfq affects those sRNAs that target the coding sequence. Here, the contribution of Hfq to the annealing of three sRNAs, RybB, SdsR, and MicC, to the coding sequence of Salmonella ompD mRNA was investigated. Hfq bound to ompD mRNA with tight, subnanomolar affinity. Moreover, Hfq strongly accelerated the rates of annealing of RybB and MicC sRNAs to this mRNA, and it also had a small effect on the annealing of SdsR. The experiments using truncated RNAs revealed that the contributions of Hfq to the annealing of each sRNA were individually adjusted depending on the structures of interacting RNAs. In agreement with that, the mRNA structure probing revealed different structural contexts of each sRNA binding site. Additionally, the annealing of RybB and MicC sRNAs induced specific conformational changes in ompD mRNA consistent with local unfolding of mRNA secondary structure. Finally, the mutation analysis showed that the long AU-rich sequence in the 5′-untranslated mRNA region served as an Hfq binding site essential for the annealing of sRNAs to the coding sequence. Overall, the data showed that the functional specificity of Hfq in the annealing of each sRNA to the ompD mRNA coding sequence was determined by the sequence and structure of the interacting RNAs. PMID:27154968

  10. Draft Genome Sequences of Gluconobacter cerinus CECT 9110 and Gluconobacter japonicus CECT 8443, Acetic Acid Bacteria Isolated from Grape Must

    PubMed Central

    Sainz, Florencia

    2016-01-01

    We report here the draft genome sequences of Gluconobacter cerinus strain CECT9110 and Gluconobacter japonicus CECT8443, acetic acid bacteria isolated from grape must. Gluconobacter species are well known for their ability to oxidize sugar alcohols into the corresponding acids. Our objective was to select strains to oxidize effectively d-glucose. PMID:27365351

  11. Swfoldrate: predicting protein folding rates from amino acid sequence with sliding window method.

    PubMed

    Cheng, Xiang; Xiao, Xuan; Wu, Zhi-cheng; Wang, Pu; Lin, Wei-zhong

    2013-01-01

    Protein folding is the process by which a protein processes from its denatured state to its specific biologically active conformation. Understanding the relationship between sequences and the folding rates of proteins remains an important challenge. Most previous methods of predicting protein folding rate require the tertiary structure of a protein as an input. In this study, the long-range and short-range contact in protein were used to derive extended version of the pseudo amino acid composition based on sliding window method. This method is capable of predicting the protein folding rates just from the amino acid sequence without the aid of any structural class information. We systematically studied the contributions of individual features to folding rate prediction. The optimal feature selection procedures are adopted by means of combining the forward feature selection and sequential backward selection method. Using the jackknife cross validation test, the method was demonstrated on the large dataset. The predictor was achieved on the basis of multitudinous physicochemical features and statistical features from protein using nonlinear support vector machine (SVM) regression model, the method obtained an excellent agreement between predicted and experimentally observed folding rates of proteins. The correlation coefficient is 0.9313 and the standard error is 2.2692. The prediction server is freely available at http://www.jci-bioinfo.cn/swfrate/input.jsp. PMID:22933332

  12. From amino acid sequence to bioactivity: The biomedical potential of antitumor peptides.

    PubMed

    Blanco-Míguez, Aitor; Gutiérrez-Jácome, Alberto; Pérez-Pérez, Martín; Pérez-Rodríguez, Gael; Catalán-García, Sandra; Fdez-Riverola, Florentino; Lourenço, Anália; Sánchez, Borja

    2016-06-01

    Chemoprevention is the use of natural and/or synthetic substances to block, reverse, or retard the process of carcinogenesis. In this field, the use of antitumor peptides is of interest as, (i) these molecules are small in size, (ii) they show good cell diffusion and permeability, (iii) they affect one or more specific molecular pathways involved in carcinogenesis, and (iv) they are not usually genotoxic. We have checked the Web of Science Database (23/11/2015) in order to collect papers reporting on bioactive peptide (1691 registers), which was further filtered searching terms such as "antiproliferative," "antitumoral," or "apoptosis" among others. Works reporting the amino acid sequence of an antiproliferative peptide were kept (60 registers), and this was complemented with the peptides included in CancerPPD, an extensive resource for antiproliferative peptides and proteins. Peptides were grouped according to one of the following mechanism of action: inhibition of cell migration, inhibition of tumor angiogenesis, antioxidative mechanisms, inhibition of gene transcription/cell proliferation, induction of apoptosis, disorganization of tubulin structure, cytotoxicity, or unknown mechanisms. The main mechanisms of action of those antiproliferative peptides with known amino acid sequences are presented and finally, their potential clinical usefulness and future challenges on their application is discussed. PMID:27010507

  13. The amino acid sequences and activities of synergistic hemolysins from Staphylococcus cohnii.

    PubMed

    Mak, Pawel; Maszewska, Agnieszka; Rozalska, Malgorzata

    2008-10-01

    Staphylococcus cohnii ssp. cohnii and S. cohnii ssp. urealyticus are a coagulase-negative staphylococci considered for a long time as unable to cause infections. This situation changed recently and pathogenic strains of these bacteria were isolated from hospital environments, patients and medical staff. Most of the isolated strains were resistant to many antibiotics. The present work describes isolation and characterization of several synergistic peptide hemolysins produced by these bacteria and acting as virulence factors responsible for hemolytic and cytotoxic activities. Amino acid sequences of respective hemolysins from S. cohnii ssp. cohnii (named as H1C, H2C and H3C) and S. cohnii ssp. urealyticus (H1U, H2U and H3U) were identical. Peptides H1 and H3 possessed significant amino acid homology to three synergistic hemolysins secreted by Staphylococcus lugdunensis and to putative antibacterial peptide produced by Staphylococcus saprophyticus ssp. saprophyticus. On the other hand, hemolysin H2 had a unique sequence. All isolated peptides lysed red cells from different mammalian species and exerted a cytotoxic effect on human fibroblasts. PMID:18752624

  14. Gene control in eukaryotes and the c-value paradox "excess" DNA as an impediment to transcription of coding sequences.

    PubMed

    Zuckerkandl, E

    1976-12-31

    Ways in which control of gene activity may lead to the observed high DNA content per haploid eukaryote genome are examined. It is proposed that deoxyribonucleoprotein (DNP) acts as a barrier to transcription at two distinct structural levels. At the lower level, melting of the nucleosome supercoil (quaternary structure) and of the nucleosomes (tertiary structure) might be brought about by the process of transcription itself. After unwinding the barrier section, the polymerase would eventually reach the structural gene. The transcripts of noncoding sequences, at least as far as their "unique" sequence components are concerned, may thus have filled their main function through the very process of transcription. The possibility of an inverse relationship between the length of the DNP barrier and the rates of transcription of the coding sequences is to some extent supported by available data. Different modes of coordination between the transcription of mRNA and of hnRNA from a single functional unit of gene action (funga) are considered. An analysis of gene control at high structural levels of DNP is made on the basis of other data, in relation to the concepts of eurygenic and stenogenic control. The concept of a euryon is introduced, namely of a set of linked fugas under common eurygenic control. Structure of order higher than quaternary can be inferred to exist in larger chromomeres of polytene chromosomes and in corresponding sections of ordinary chromosomes. Only moderate amounts of highest order interphase euchromatic structure are likely to be able to be accomodated in average chromomeres and none in very thin chromomeres. Puffs are interpreted as the melting of highest order interphase structure, and the absence of puffs during transcription as the absence of this highest order structure in the resting state of the chromomeres. Genes that are constantly active in all tissues may dispense with highest order interphase structure and with the corresponding control

  15. Clostridium sticklandii, a specialist in amino acid degradation:revisiting its metabolism through its genome sequence

    PubMed Central

    2010-01-01

    Background Clostridium sticklandii belongs to a cluster of non-pathogenic proteolytic clostridia which utilize amino acids as carbon and energy sources. Isolated by T.C. Stadtman in 1954, it has been generally regarded as a "gold mine" for novel biochemical reactions and is used as a model organism for studying metabolic aspects such as the Stickland reaction, coenzyme-B12- and selenium-dependent reactions of amino acids. With the goal of revisiting its carbon, nitrogen, and energy metabolism, and comparing studies with other clostridia, its genome has been sequenced and analyzed. Results C. sticklandii is one of the best biochemically studied proteolytic clostridial species. Useful additional information has been obtained from the sequencing and annotation of its genome, which is presented in this paper. Besides, experimental procedures reveal that C. sticklandii degrades amino acids in a preferential and sequential way. The organism prefers threonine, arginine, serine, cysteine, proline, and glycine, whereas glutamate, aspartate and alanine are excreted. Energy conservation is primarily obtained by substrate-level phosphorylation in fermentative pathways. The reactions catalyzed by different ferredoxin oxidoreductases and the exergonic NADH-dependent reduction of crotonyl-CoA point to a possible chemiosmotic energy conservation via the Rnf complex. C. sticklandii possesses both the F-type and V-type ATPases. The discovery of an as yet unrecognized selenoprotein in the D-proline reductase operon suggests a more detailed mechanism for NADH-dependent D-proline reduction. A rather unusual metabolic feature is the presence of genes for all the enzymes involved in two different CO2-fixation pathways: C. sticklandii harbours both the glycine synthase/glycine reductase and the Wood-Ljungdahl pathways. This unusual pathway combination has retrospectively been observed in only four other sequenced microorganisms. Conclusions Analysis of the C. sticklandii genome and

  16. Complete amino acid sequence of the myoglobin from the Pacific spotted dolphin, Stenella attenuata graffmani.

    PubMed

    Jones, B N; Wang, C C; Dwulet, F E; Lehman, L D; Meuth, J L; Bogardt, R A; Gurd, F R

    1979-04-25

    The complete amino acid sequence of the major component myoglobin from the Pacific spotted dolphin, Stenella attenuata graffmani, was determined by the automated Edman degradation of several large peptides obtained by specific cleavage of the protein. The acetimidated apomyoglobin was selectively cleaved at its two methionyl residues with cyanogen bromide and at its three arginyl residues by trypsin. By subjecting four of these peptides and the apomyoglobin to automated Edman degradation, over 80% of the primary structure of the protein was obtained. The remainder of the covalent structure was determined by the sequence analysis of peptides that resulted from further digestion of the central cyanogen bromide fragment. This fragment was cleaved at its glutamyl residues with staphylococcal protease and its lysyl residues with trypsin. The action of trypsin was restricted to the lysyl residues by chemical modification of the single arginyl residue of the fragment with 1,2-cyclohexanedione. The primary structure of this myoglobin proved to be identical with that from the Atlantic bottlenosed dolphin and Pacific common dolphin but differs from the myoglobins of the killer whale and pilot whale at two positions. The above sequence identities and differences reflect the close taxonomic relationship of these five species of Cetacea. PMID:454657

  17. Isolation and amino acid sequences of squirrel monkey (Saimiri sciurea) insulin and glucagon.

    PubMed Central

    Yu, J H; Eng, J; Yalow, R S

    1990-01-01

    It was reported two decades ago that insulin was not detectable in the glucose-stimulated state in Saimiri sciurea, the New World squirrel monkey, by a radioimmunoassay system developed with guinea pig anti-pork insulin antibody and labeled pork insulin. With the same system, reasonable levels were observed in rhesus monkeys and chimpanzees. This suggested that New World monkeys, like the New World hystricomorph rodents such as the guinea pig and the coypu, might have insulins whose sequences differ markedly from those of Old World mammals. In this report we describe the purification and amino acid sequences of squirrel monkey insulin and glucagon. We demonstrate that the substitutions at B29, B27, A2, A4, and A17 of squirrel monkey insulin are identical with those previously found in another New World primate, the owl monkey (Aotus trivirgatus). The immunologic cross-reactivity of this insulin in our immunoassay system is only a few percent of that of human insulin. Squirrel monkey glucagon is identical with the usual glucagon found in Old World mammals, which predicts that the glucagons of other New World monkeys would not differ from the usual Old World mammalian glucagon. It appears that the peptides of the New World monkeys have diverged less from those of the Old World mammals than have those of the New World hystricomorph rodents. The striking improvements in peptide purification and sequencing have the potential for adding new information concerning the evolutionary divergence of species. PMID:2263627

  18. Isolation and amino acid sequences of squirrel monkey (Saimiri sciurea) insulin and glucagon

    SciTech Connect

    Yu, Jinghua ); Eng, J.; Yalow, R.S. City Univ. of New York, NY )

    1990-12-01

    It was reported two decades ago that insulin was not detectable in the glucose-stimulated state in Saimiri sciurea, the New World squirrel monkey, by a radioimmunoassay system developed with guinea pig anti-pork insulin antibody and labeled park insulin. With the same system, reasonable levels were observed in rhesus monkeys and chimpanzees. This suggested that New World monkeys, like the New World hystricomorph rodents such as the guinea pig and the coypu, might have insulins whose sequences differ markedly from those of Old World mammals. In this report the authors describe the purification and amino acid sequences of squirrel monkey insulin and glucagon. They demonstrate that the substitutions at B29, B27, A2, A4, and A17 of squirrel monkey insulin are identical with those previously found in another New World primate, the owl monkey (Aotus trivirgatus). The immunologic cross-reactivity of this insulin in their immunoassay system is only a few percent of that of human insulin. It appears that the peptides of the New World monkeys have diverged less from those of the Old World mammals than have those of the New World hystricomorph rodents. The striking improvements in peptide purification and sequencing have the potential for adding new information concerning the evolutionary divergence of species.

  19. Binding site discovery from nucleic acid sequences by discriminative learning of hidden Markov models

    PubMed Central

    Maaskola, Jonas; Rajewsky, Nikolaus

    2014-01-01

    We present a discriminative learning method for pattern discovery of binding sites in nucleic acid sequences based on hidden Markov models. Sets of positive and negative example sequences are mined for sequence motifs whose occurrence frequency varies between the sets. The method offers several objective functions, but we concentrate on mutual information of condition and motif occurrence. We perform a systematic comparison of our method and numerous published motif-finding tools. Our method achieves the highest motif discovery performance, while being faster than most published methods. We present case studies of data from various technologies, including ChIP-Seq, RIP-Chip and PAR-CLIP, of embryonic stem cell transcription factors and of RNA-binding proteins, demonstrating practicality and utility of the method. For the alternative splicing factor RBM10, our analysis finds motifs known to be splicing-relevant. The motif discovery method is implemented in the free software package Discrover. It is applicable to genome- and transcriptome-scale data, makes use of available repeat experiments and aside from binary contrasts also more complex data configurations can be utilized. PMID:25389269

  20. Clofibrate-induced cytochrome P450-lauric acid omega hydroxylase(P450LA omega):purification, cDNA cloning, sequence and regulation

    SciTech Connect

    Hardwick, J.P.; Song, B.J.; Gonzalez, F.J.

    1986-05-01

    A cytochrome P450 that hydroxylates lauric acid at the 12 position (P450LA omega) was isolated from liver microsomes of clofibrate treated rats. P450LA omega was immunologically distinct from P450s a,b,c,d,e,f,g,h,j,PB1, and PCN1. Polyclonal antibody against P450LA omega was utilized to screen a gt11 cDNA library. A clone (pP450LA omega), was isolated and its sequence determined. The P450LA omega mRNA is a minimum 2387 nts in length and codes for a P450 of Mr.58,222 daltons. This protein shares less than 35% amino acid similarity with P450s b,c,d,e,f,PB1, and PCN1; however, it does contain a hydrophobic amino terminal peptide and a conserved sequence surrounding the Cys residue at position 456, which is similar to other microsomal P450s. P450LA omega is present at high levels in untreated rat kidney and is induced by clofibrate in both kidney and liver. This induction is the result of an accumulation of mRNA through a rapid transcriptional activation of the P450LA gene. Southern blotting data suggest the presence of 2 or 3 genes in the P450LA omega family. This P450 gene family may be associated with arachidonic acid and prostraglandin metabolism in kidney and other tissues.

  1. Individual variation of human S1P₁ coding sequence leads to heterogeneity in receptor function and drug interactions.

    PubMed

    Obinata, Hideru; Gutkind, Sarah; Stitham, Jeremiah; Okuno, Toshiaki; Yokomizo, Takehiko; Hwa, John; Hla, Timothy

    2014-12-01

    Sphingosine 1-phosphate receptor 1 (S1P₁), an abundantly-expressed G protein-coupled receptor which regulates key vascular and immune responses, is a therapeutic target in autoimmune diseases. Fingolimod/Gilenya (FTY720), an oral medication for relapsing-remitting multiple sclerosis, targets S1P₁ receptors on immune and neural cells to suppress neuroinflammation. However, suppression of endothelial S1P₁ receptors is associated with cardiac and vascular adverse effects. Here we report the genetic variations of the S1P₁ coding region from exon sequencing of >12,000 individuals and their functional consequences. We conducted functional analyses of 14 nonsynonymous single nucleotide polymorphisms (SNPs) of the S1PR1 gene. One SNP mutant (Arg¹²⁰ to Pro) failed to transmit sphingosine 1-phosphate (S1P)-induced intracellular signals such as calcium increase and activation of p44/42 MAPK and Akt. Two other mutants (Ile⁴⁵ to Thr and Gly³⁰⁵ to Cys) showed normal intracellular signals but impaired S1P-induced endocytosis, which made the receptor resistant to FTY720-induced degradation. Another SNP mutant (Arg¹³ to Gly) demonstrated protection from coronary artery disease in a high cardiovascular risk population. Individuals with this mutation showed a significantly lower percentage of multi-vessel coronary obstruction in a risk factor-matched case-control study. This study suggests that individual genetic variations of S1P₁ can influence receptor function and, therefore, infer differential disease risks and interaction with S1P₁-targeted therapeutics. PMID:25293589

  2. Amino acid sequence analysis and characterization of a ribonuclease from starfish Asterias amurensis.

    PubMed

    Motoyoshi, Naomi; Kobayashi, Hiroko; Itagaki, Tadashi; Inokuchi, Norio

    2016-09-01

    The aim of this study was to phylogenetically characterize the location of the RNase T2 enzyme in the starfish (Asterias amurensis). We isolated an RNase T2 ribonuclease (RNase Aa) from the ovaries of starfish and determined its amino acid sequence by protein chemistry and cloning cDNA encoding RNase Aa. The isolated protein had 231 amino acid residues, a predicted molecular mass of 25,906 Da, and an optimal pH of 5.0. RNase Aa preferentially released guanylic acid from the RNA. The catalytic sites of the RNase T2 family are conserved in RNase Aa; furthermore, the distribution of the cysteine residues in RNase Aa is similar to that in other animal and plant T2 RNases. RNase Aa is cleaved at two points: 21 residues from the N-terminus and 29 residues from the C-terminus; however, both fragments may remain attached to the protein via disulfide bridges, leading to the maintenance of its conformation, as suggested by circular dichroism spectrum analysis. The phylogenetic analysis revealed that starfish RNase Aa is evolutionarily an intermediate between protozoan and oyster RNases. PMID:26920046

  3. Reasons for the occurrence of the twenty coded protein amino acids

    NASA Technical Reports Server (NTRS)

    Weber, A. L.; Miller, S. L.

    1981-01-01

    Factors involved in the selection of the 20 protein L-alpha-amino acids during chemical evolution and the early stages of Darwinian evolution are discussed. The selection is considered on the basis of the availability in the primitive ocean, function in proteins, the stability of the amino acid and its peptides, stability to racemization, and stability on the transfer RNA. It is concluded that aspartic acid, glutamic acid, arginine, lysine, serine and possibly threonine are the best choices for acidic, basic and hydroxy amino acids. The hydrophobic amino acids are reasonable choices, except for the puzzling absences of alpha-amino-n-butyric acid, norvaline and norleucine. The choices of the sulfur and aromatic amino acids seem reasonable, but are not compelling. Asparagine and glutamine are apparently not primitive. If life were to arise on another planet, it would be expected that the catalysts would be poly-alpha-amino acids and that about 75% of the amino acids would be the same as on the earth.

  4. Accurate prediction of the toxicity of benzoic acid compounds in mice via oral without using any computer codes.

    PubMed

    Keshavarz, Mohammad Hossein; Gharagheizi, Farhad; Shokrolahi, Arash; Zakinejad, Sajjad

    2012-10-30

    Most of benzoic acid derivatives are toxic, which may cause serious public health and environmental problems. Two novel simple and reliable models are introduced for desk calculations of the toxicity of benzoic acid compounds in mice via oral LD(50) with more reliance on their answers as one could attach to the more complex outputs. They require only elemental composition and molecular fragments without using any computer codes. The first model is based on only the number of carbon and hydrogen atoms, which can be improved by several molecular fragments in the second model. For 57 benzoic compounds, where the computed results of quantitative structure-toxicity relationship (QSTR) were recently reported, the predicted results of two simple models of present method are more reliable than QSTR computations. The present simple method is also tested with further 324 benzoic acid compounds including complex molecular structures, which confirm good forecasting ability of the second model. PMID:22959133

  5. Electrophysiological responses of Xenopus oocytes to amino acids: criteria for expression of injected mRNA coding chemoreceptors.

    PubMed

    Etoh, M; Yoshii, K

    1994-10-01

    Responses of endogenous transporters/receptors of Xenopus oocytes to L-alanine, L-arginine, L-leucine and L-serine were investigated under voltage clamp conditions. (a) Concentration-response relations for the amino acids followed Langmuir's adsorption isotherm. (b) The neutral amino acids required Na+ to elicit the responses, whereas L-arginine did not. (c) The responses to L-alanine decreased with decreasing pH and became undetectable at pH 5.5. The present experiments supply criteria to judge if the oocytes translate exogenous mRNA coding taste or olfactory receptor proteins for the amino acids, the best characterized stimuli, especially in fishes. PMID:7956120

  6. Nucleotide and deduced amino acid sequences of the nucleocapsid protein of the virulent A75/17-CDV strain of canine distemper virus.

    PubMed

    Stettler, M; Zurbriggen, A

    1995-05-01

    Virus persistence is essential in the chronic inflammatory canine distemper virus (CDV)-induced demyelinating disease. In the case of CDV there is a close association between persistence and virulence. Virulent CDV isolated from dogs with distemper shows immediate persistence in primary dog brain cell cultures (DBCC) and in different cell lines. We have evidence that the nucleocapsid (NP) protein plays an important role in the development of persistence. The NP-protein, the most abundant structural virus protein, also influences virus assembly and has some regulatory functions in virus transcription and replication. In this study we compared the nucleotide and deduced amino acid sequence of a virulent CDV strain (A75/17-CDV) to a culture-attenuated non-virulent strain (OP-CDV). Viral RNA was extracted from DBCC infected with virulent CDV. Virulent CDV retains its in vivo properties, such as virulence and ability to cause demyelination, when propagated in these DBCC. The viral RNA was reverse transcribed and the resulting cDNA amplified by polymerase chain reaction for subsequent cloning. The nucleotide sequences of these clones were determined by the dideoxy chain termination method. The number of nucleotides and the putative NP-protein of the virulent strain matched the attenuated CDV strain. We observed a total of 105 nucleotide differences. Three were localised within the 3' and five within the 5' non-coding region of the NP-gene. The 97 nucleotide changes within the coding region resulted in 22 amino acid differences. 10 of these amino acid (AA) modifications were within the N-terminal region (AA 1 to 159) and 12 within the C-terminal area (AA 351 to 523).(ABSTRACT TRUNCATED AT 250 WORDS) PMID:8588315

  7. Massively parallel sequencing of the entire control region and targeted coding region SNPs of degraded mtDNA using a simplified library preparation method.

    PubMed

    Lee, Eun Young; Lee, Hwan Young; Oh, Se Yoon; Jung, Sang-Eun; Yang, In Seok; Lee, Yang-Han; Yang, Woo Ick; Shin, Kyoung-Jin

    2016-05-01

    The application of next-generation sequencing (NGS) to forensic genetics is being explored by an increasing number of laboratories because of the potential of high-throughput sequencing for recovering genetic information from multiple markers and multiple individuals in a single run. A cumbersome and technically challenging library construction process is required for NGS. In this study, we propose a simplified library preparation method for mitochondrial DNA (mtDNA) analysis that involves two rounds of PCR amplification. In the first-round of multiplex PCR, six fragments covering the entire mtDNA control region and 22 fragments covering interspersed single nucleotide polymorphisms (SNPs) in the coding region that can be used to determine global haplogroups and East Asian haplogroups were amplified using template-specific primers with read sequences. In the following step, indices and platform-specific sequences for the MiSeq(®) system (Illumina) were added by PCR. The barcoded library produced using this simplified workflow was successfully sequenced on the MiSeq system using the MiSeq Reagent Nano Kit v2. A total of 0.4 GB of sequences, 80.6% with base quality of >Q30, were obtained from 12 degraded DNA samples and mapped to the revised Cambridge Reference Sequence (rCRS). A relatively even read count was obtained for all amplicons, with an average coverage of 5200 × and a less than three-fold read count difference between amplicons per sample. Control region sequences were successfully determined, and all samples were assigned to the relevant haplogroups. In addition, enhanced discrimination was observed by adding coding region SNPs to the control region in in silico analysis. Because the developed multiplex PCR system amplifies small-sized amplicons (<250 bp), NGS analysis using the library preparation method described here allows mtDNA analysis using highly degraded DNA samples. PMID:26844917

  8. Full Genome Virus Detection in Fecal Samples Using Sensitive Nucleic Acid Preparation, Deep Sequencing, and a Novel Iterative Sequence Classification Algorithm

    PubMed Central

    Cotten, Matthew; Oude Munnink, Bas; Canuti, Marta; Deijs, Martin; Watson, Simon J.; Kellam, Paul; van der Hoek, Lia

    2014-01-01

    We have developed a full genome virus detection process that combines sensitive nucleic acid preparation optimised for virus identification in fecal material with Illumina MiSeq sequencing and a novel post-sequencing virus identification algorithm. Enriched viral nucleic acid was converted to double-stranded DNA and subjected to Illumina MiSeq sequencing. The resulting short reads were processed with a novel iterative Python algorithm SLIM for the identification of sequences with homology to known viruses. De novo assembly was then used to generate full viral genomes. The sensitivity of this process was demonstrated with a set of fecal samples from HIV-1 infected patients. A quantitative assessment of the mammalian, plant, and bacterial virus content of this compartment was generated and the deep sequencing data were sufficient to assembly 12 complete viral genomes from 6 virus families. The method detected high levels of enteropathic viruses that are normally controlled in healthy adults, but may be involved in the pathogenesis of HIV-1 infection and will provide a powerful tool for virus detection and for analyzing changes in the fecal virome associated with HIV-1 progression and pathogenesis. PMID:24695106

  9. Evolutionary connections of biological kingdoms based on protein and nucleic acid sequence evidence

    NASA Technical Reports Server (NTRS)

    Dayhoff, M. O.

    1983-01-01

    Prokaryotic and eukaryotic evolutionary trees are developed from protein and nucleic-acid sequences by the methods of numerical taxonomy. Trees are presented for bacterial ferredoxins, 5S ribosomal RNA, c-type cytochromes , cytochromes c2 and c', and 5.8S ribosomal RNA; the implications for early evolution are discussed; and a composite tree showing the branching of the anaerobes, aerobes, archaebacteria, and eukaryotes is shown. Single lines are found for all oxygen-evolving photosynthetic forms and for the salt-loving and high-temperature forms of archaebacteria. It is argued that the eukaryote mitochondria, chloroplasts, and cytoplasmic host material are descended from free-living prokaryotes that formed symbiotic associations, with more than one symbiotic event involved in the evolution of each organelle.

  10. Trypsin inhibitors from ridged gourd (Luffa acutangula Linn.) seeds: purification, properties, and amino acid sequences.

    PubMed

    Haldar, U C; Saha, S K; Beavis, R C; Sinha, N K

    1996-02-01

    Two trypsin inhibitors, LA-1 and LA-2, have been isolated from ridged gourd (Luffa acutangula Linn.) seeds and purified to homogeneity by gel filtration followed by ion-exchange chromatography. The isoelectric point is at pH 4.55 for LA-1 and at pH 5.85 for LA-2. The Stokes radius of each inhibitor is 11.4 A. The fluorescence emission spectrum of each inhibitor is similar to that of the free tyrosine. The biomolecular rate constant of acrylamide quenching is 1.0 x 10(9) M-1 sec-1 for LA-1 and 0.8 x 10(9) M-1 sec-1 for LA-2 and that of K2HPO4 quenching is 1.6 x 10(11) M-1 sec-1 for LA-1 and 1.2 x 10(11) M-1 sec-1 for LA-2. Analysis of the circular dichroic spectra yields 40% alpha-helix and 60% beta-turn for La-1 and 45% alpha-helix and 55% beta-turn for LA-2. Inhibitors LA-1 and LA-2 consist of 28 and 29 amino acid residues, respectively. They lack threonine, alanine, valine, and tryptophan. Both inhibitors strongly inhibit trypsin by forming enzyme-inhibitor complexes at a molar ratio of unity. A chemical modification study suggests the involvement of arginine of LA-1 and lysine of LA-2 in their reactive sites. The inhibitors are very similar in their amino acid sequences, and show sequence homology with other squash family inhibitors. PMID:8924202

  11. Microfluidic platform for isolating nucleic acid targets using sequence specific hybridization

    PubMed Central

    Wang, Jingjing; Morabito, Kenneth; Tang, Jay X.; Tripathi, Anubhav

    2013-01-01

    The separation of target nucleic acid sequences from biological samples has emerged as a significant process in today's diagnostics and detection strategies. In addition to the possible clinical applications, the fundamental understanding of target and sequence specific hybridization on surface modified magnetic beads is of high value. In this paper, we describe a novel microfluidic platform that utilizes a mobile magnetic field in static microfluidic channels, where single stranded DNA (ssDNA) molecules are isolated via nucleic acid hybridization. We first established efficient isolation of biotinylated capture probe (BP) using streptavidin-coated magnetic beads. Subsequently, we investigated the hybridization of target ssDNA with BP bound to beads and explained these hybridization kinetics using a dual-species kinetic model. The number of hybridized target ssDNA molecules was determined to be about 6.5 times less than that of BP on the bead surface, due to steric hindrance effects. The hybridization of target ssDNA with non-complementary BP bound to bead was also examined, and non-specific hybridization was found to be insignificant. Finally, we demonstrated highly efficient capture and isolation of target ssDNA in the presence of non-target ssDNA, where as low as 1% target ssDNA can be detected from mixture. The microfluidic method described in this paper is significantly relevant and is broadly applicable, especially towards point-of-care biological diagnostic platforms that require binding and separation of known target biomolecules, such as RNA, ssDNA, or protein. PMID:24404041

  12. Human Retroviruses and AIDS. A compilation and analysis of nucleic acid and amino acid sequences: I--II; III--V

    SciTech Connect

    Myers, G.; Korber, B.; Wain-Hobson, S.; Smith, R.F.; Pavlakis, G.N.

    1993-12-31

    This compendium and the accompanying floppy diskettes are the result of an effort to compile and rapidly publish all relevant molecular data concerning the human immunodeficiency viruses (HIV) and related retroviruses. The scope of the compendium and database is best summarized by the five parts that it comprises: (I) HIV and SIV Nucleotide Sequences; (II) Amino Acid Sequences; (III) Analyses; (IV) Related Sequences; and (V) Database Communications. Information within all the parts is updated at least twice in each year, which accounts for the modes of binding and pagination in the compendium.

  13. Numeral series hidden in the distribution of atomic mass of amino acids to codon domains in the genetic code.

    PubMed

    Wohlin, Åsa

    2015-03-21

    The distribution of codons in the nearly universal genetic code is a long discussed issue. At the atomic level, the numeral series 2x(2) (x=5-0) lies behind electron shells and orbitals. Numeral series appear in formulas for spectral lines of hydrogen. The question here was if some similar scheme could be found in the genetic code. A table of 24 codons was constructed (synonyms counted as one) for 20 amino acids, four of which have two different codons. An atomic mass analysis was performed, built on common isotopes. It was found that a numeral series 5 to 0 with exponent 2/3 times 10(2) revealed detailed congruency with codon-grouped amino acid side-chains, simultaneously with the division on atom kinds, further with main 3rd base groups, backbone chains and with codon-grouped amino acids in relation to their origin from glycolysis or the citrate cycle. Hence, it is proposed that this series in a dynamic way may have guided the selection of amino acids into codon domains. Series with simpler exponents also showed noteworthy correlations with the atomic mass distribution on main codon domains; especially the 2x(2)-series times a factor 16 appeared as a conceivable underlying level, both for the atomic mass and charge distribution. Furthermore, it was found that atomic mass transformations between numeral systems, possibly interpretable as dimension degree steps, connected the atomic mass of codon bases with codon-grouped amino acids and with the exponent 2/3-series in several astonishing ways. Thus, it is suggested that they may be part of a deeper reference system. PMID:25623487

  14. Variations in the coding and regulatory sequences of the angiogenin (ANG) gene are not associated to ALS (amyotrophic lateral sclerosis) in the Italian population.

    PubMed

    Corrado, Lucia; Battistini, Stefania; Penco, Silvana; Bergamaschi, Laura; Testa, Lucia; Ricci, Claudia; Giannini, Fabio; Greco, Giuseppe; Patrosso, Maria Cristina; Pileggi, Simona; Causarano, Renzo; Mazzini, Letizia; Momigliano-Richiardi, Patricia; D'Alfonso, Sandra

    2007-07-15

    Potentially causative missense variations in the ANG gene and a positive association with the synonymous rs11701-G substitution was detected mainly in Irish and Scottish ALS patients. We screened 262 Italian SOD1 negative ALS patients (250 sporadic) and 415 matched controls for sequence variations in the coding, 3'/5' UTR and 5' flanking (642 bp) regions of the ANG gene. We identified 53 sequence variations of which 46 new, 20 with a minor allele frequency (MAF) >or=0.01 and only three localised in the coding sequence, namely the missense I46V, identified in one patient and two controls, and the synonymous G86G and T97T corresponding to rs11701 and rs2228653. None of the detected SNPs or of their haplotypic combinations was significantly associated with ALS susceptibility or clinical features. In conclusion, we did not detect the association with rs11701-G or with any other newly detected variation in the ANG regulatory region. Furthermore we did not identify potentially causal mutations in the coding region. PMID:17462671

  15. Lactic acid production from potato peel waste by anaerobic sequencing batch fermentation using undefined mixed culture.

    PubMed

    Liang, Shaobo; McDonald, Armando G; Coats, Erik R

    2015-11-01

    Lactic acid (LA) is a necessary industrial feedstock for producing the bioplastic, polylactic acid (PLA), which is currently produced by pure culture fermentation of food carbohydrates. This work presents an alternative to produce LA from potato peel waste (PPW) by anaerobic fermentation in a sequencing batch reactor (SBR) inoculated with undefined mixed culture from a municipal wastewater treatment plant. A statistical design of experiments approach was employed using set of 0.8L SBRs using gelatinized PPW at a solids content range from 30 to 50 g L(-1), solids retention time of 2-4 days for yield and productivity optimization. The maximum LA production yield of 0.25 g g(-1) PPW and highest productivity of 125 mg g(-1) d(-1) were achieved. A scale-up SBR trial using neat gelatinized PPW (at 80 g L(-1) solids content) at the 3 L scale was employed and the highest LA yield of 0.14 g g(-1) PPW and a productivity of 138 mg g(-1) d(-1) were achieved with a 1 d SRT. PMID:25708409

  16. Bacterial community compositions in sediment polluted by perfluoroalkyl acids (PFAAs) using Illumina high-throughput sequencing.

    PubMed

    Sun, Yajun; Wang, Tieyu; Peng, Xiawei; Wang, Pei; Lu, Yonglong

    2016-06-01

    The characterization of bacterial community compositions and the change in perfluoroalkyl acids (PFAAs) along a natural river distribution system were explored in the present study. Illumina high-throughput sequencing was used to explore bacterial community diversity and structure in sediment polluted by PFAAs from the Xiaoqing River, the area with concentrated fluorochemical facilities in China. The concentration of PFAAs was in the range of 8.44-465.60 ng/g dry weight (dw) in sediment. Perfluorooctanoic acid (PFOA) was the dominant PFAA in all samples, which accounted for 94.2 % of total PFAAs. High-level PFOA could lead to an obvious increase in relative abundance of Proteobacteria, ε-Proteobacteria, Thiobacillus, and Sulfurimonas and the decrease in relative abundance of other bacteria. Redundancy analysis revealed that PFOA played an important role in the formation of bacterial community, and PFOA at higher concentration could reduce the diversity of bacterial community. When the concentration of PFOA was below 100 ng/g dw in sediment, no significant effect on microbial community structure was observed. Thiobacillus and Sulfurimonas were positively correlated with the concentration of PFOA, suggesting that both genera were resistant to PFOA contamination. PMID:26780047

  17. Mass spectrometric detection of the amino acid sequence polymorphism of the hepatitis C virus antigen.

    PubMed

    Kaysheva, A L; Ivanov, Yu D; Frantsuzov, P A; Krohin, N V; Pavlova, T I; Uchaikin, V F; Konev, V А; Kovalev, O B; Ziborov, V S; Archakov, A I

    2016-03-01

    A method for detection and identification of the hepatitis C virus antigen (HCVcoreAg) in human serum with consideration for possible amino acid substitutions is proposed. The method is based on a combination of biospecific capturing and concentrating of the target protein on the surface of the chip for atomic force microscope (AFM chip) with subsequent protein identification by tandem mass spectrometric (MS/MS) analysis. Biospecific AFM-capturing of viral particles containing HCVcoreAg from serum samples was performed by use of AFM chips with monoclonal antibodies (anti-HCVcore) covalently immobilized on the surface. Biospecific complexes were registered and counted by AFM. Further MS/MS analysis allowed to reliably identify the HCVcoreAg in the complexes formed on the AFM chip surface. Analysis of MS/MS spectra, with the account taken of the possible polymorphisms in the amino acid sequence of the HCVcoreAg, enabled us to increase the number of identified peptides. PMID:26773170

  18. Peptide sequencing by using a combination of partial acid hydrolysis and fast-atom-bombardment mass spectrometry.

    PubMed Central

    De Angelis, F; Botta, M; Ceccarelli, S; Nicoletti, R

    1986-01-01

    To overcome the limit of the intensity of ions carrying sequence information in structural determinations of peptides by fast-atom-bombardment m.s., we have developed a method that consists in taking spectra of the peptide acid hydrolysates at different hydrolysis times. Peaks correspond to the oligomers arising from the peptide partial hydrolysis. The sequence can then be identified from the structurally overlapping fragments. PMID:2428356

  19. Canine preprorelaxin: nucleic acid sequence and localization within the canine placenta.

    PubMed

    Klonisch, T; Hombach-Klonisch, S; Froehlich, C; Kauffold, J; Steger, K; Steinetz, B G; Fischer, B

    1999-03-01

    Employing uteroplacental tissue at Day 35 of gestation, we determined the nucleic acid sequence of canine preprorelaxin using reverse transcription- and rapid amplification of cDNA ends-polymerase chain reaction. Canine preprorelaxin cDNA consisted of 534 base pairs encoding a protein of 177 amino acids with a signal peptide of 25 amino acids (aa), a B domain of 35 aa, a C domain of 93 aa, and an A domain of 24 aa. The putative receptor binding region in the N'-terminal part of the canine relaxin B domain GRDYVR contained two substitutions from the classical motif (E-->D and L-->Y). Canine preprorelaxin shared highest homology with porcine and equine preprorelaxin. Northern analysis revealed a 1-kilobase transcript present in total RNA of canine uteroplacental tissue but not of kidney tissue. Uteroplacental tissue from two bitches each at Days 30 and 35 of gestation were studied by in situ hybridization to localize relaxin mRNA. Immunohistochemistry for relaxin, cytokeratin, vimentin, and von Willebrand factor was performed on uteroplacental tissue at Day 30 of gestation. The basal cell layer at the core of the chorionic villi was devoid of relaxin mRNA and immunoreactive relaxin or vimentin but was immunopositive for cytokeratin and identified as cytotrophoblast cells. The cell layer surrounding the chorionic villi displayed specific hybridization signals for relaxin mRNA and immunoreactivity for relaxin and cytokeratin but not for vimentin, and was identified as syncytiotrophoblast. Those areas of the chorioallantoic tissue with most intense relaxin immunoreactivity were highly vascularized as demonstrated by immunoreactive von Willebrand factor expressed on vascular endothelium. The uterine glands and nonplacental uterine areas of the canine zonary girdle placenta were devoid of relaxin mRNA and relaxin. We conclude that the syncytiotrophoblast is the source of relaxin in the canine placenta. PMID:10026098

  20. Purification and partial amino acid sequence of the chloroplast cytochrome b-559.

    PubMed

    Widger, W R; Cramer, W A; Hermodson, M; Meyer, D; Gullifor, M

    1984-03-25

    The hydrophobic cytochrome b-559, purified from unstacked, ethanol-washed spinach thylakoid membranes, using extraction with 2% Triton X-100 in 4 M urea and three chromatographic steps in the presence of protease inhibitors, has a dominant band on sodium dodecyl sulfate-urea gels corresponding to Mr = 10,000. The yield of this preparation is 30-50% (5-10 mg) starting with 600 mg of chlorophyll. The heme content yields a calculated molecular weight of no more than 17,500/heme, and perhaps somewhat smaller after correction for impurities. The Mr = 10,000 band is stained by the tetramethylbenzidine-H2O2 heme reagent on lithium dodecyl sulfate gels run at 0 degrees C. The Mr = 10,000 protein, further separated by high performance liquid chromatography, contains a unique NH2 terminus that is not blocked, and the amino acid sequence for the first 27 residues is NH2-Ser-Gly-Ser-Thr-Gly-Glu-Arg-Ser-Phe-Ala-Asp-Ile-Ile-Thr-Ser-Ile-Arg-Tyr-Trp -Val-Ile-X-Ser-Ile-Thr-Ile-Pro. . . COOH. Approximately 55% of the amino acids are hydrophobic, based on amino acid analysis of the Mr = 10,000 peptide, which also indicated the presence of at least one histidine. Only one cytochrome b-559 component could be identified, whose yield indicated that it arises from a single b-559 protein in chloroplasts corresponding to the in situ high potential cytochrome of the chloroplast photosystem II. PMID:6706983