Sample records for complex sequence homologies

  1. Gentle Masking of Low-Complexity Sequences Improves Homology Search

    PubMed Central

    Frith, Martin C.

    2011-01-01

    Detection of sequences that are homologous, i.e. descended from a common ancestor, is a fundamental task in computational biology. This task is confounded by low-complexity tracts (such as atatatatatat), which arise frequently and independently, causing strong similarities that are not homologies. There has been much research on identifying low-complexity tracts, but little research on how to treat them during homology search. We propose to find homologies by aligning sequences with “gentle” masking of low-complexity tracts. Gentle masking means that the match score involving a masked letter is , where is the unmasked score. Gentle masking slightly but noticeably improves the sensitivity of homology search (compared to “harsh” masking), without harming specificity. We show examples in three useful homology search problems: detection of NUMTs (nuclear copies of mitochondrial DNA), recruitment of metagenomic DNA reads to reference genomes, and pseudogene detection. Gentle masking is currently the best way to treat low-complexity tracts during homology search. PMID:22205972

  2. Not all transmembrane helices are born equal: Towards the extension of the sequence homology concept to membrane proteins

    PubMed Central

    2011-01-01

    Background Sequence homology considerations widely used to transfer functional annotation to uncharacterized protein sequences require special precautions in the case of non-globular sequence segments including membrane-spanning stretches composed of non-polar residues. Simple, quantitative criteria are desirable for identifying transmembrane helices (TMs) that must be included into or should be excluded from start sequence segments in similarity searches aimed at finding distant homologues. Results We found that there are two types of TMs in membrane-associated proteins. On the one hand, there are so-called simple TMs with elevated hydrophobicity, low sequence complexity and extraordinary enrichment in long aliphatic residues. They merely serve as membrane-anchoring device. In contrast, so-called complex TMs have lower hydrophobicity, higher sequence complexity and some functional residues. These TMs have additional roles besides membrane anchoring such as intra-membrane complex formation, ligand binding or a catalytic role. Simple and complex TMs can occur both in single- and multi-membrane-spanning proteins essentially in any type of topology. Whereas simple TMs have the potential to confuse searches for sequence homologues and to generate unrelated hits with seemingly convincing statistical significance, complex TMs contain essential evolutionary information. Conclusion For extending the homology concept onto membrane proteins, we provide a necessary quantitative criterion to distinguish simple TMs (and a sufficient criterion for complex TMs) in query sequences prior to their usage in homology searches based on assessment of hydrophobicity and sequence complexity of the TM sequence segments. Reviewers This article was reviewed by Shamil Sunyaev, L. Aravind and Arcady Mushegian. PMID:22024092

  3. Sequence basis of Barnacle Cement Nanostructure is Defined by Proteins with Silk Homology

    NASA Astrophysics Data System (ADS)

    So, Christopher R.; Fears, Kenan P.; Leary, Dagmar H.; Scancella, Jenifer M.; Wang, Zheng; Liu, Jinny L.; Orihuela, Beatriz; Rittschof, Dan; Spillmann, Christopher M.; Wahl, Kathryn J.

    2016-11-01

    Barnacles adhere by producing a mixture of cement proteins (CPs) that organize into a permanently bonded layer displayed as nanoscale fibers. These cement proteins share no homology with any other marine adhesives, and a common sequence-basis that defines how nanostructures function as adhesives remains undiscovered. Here we demonstrate that a significant unidentified portion of acorn barnacle cement is comprised of low complexity proteins; they are organized into repetitive sequence blocks and found to maintain homology to silk motifs. Proteomic analysis of aggregate bands from PAGE gels reveal an abundance of Gly/Ala/Ser/Thr repeats exemplified by a prominent, previously unidentified, 43 kDa protein in the solubilized adhesive. Low complexity regions found throughout the cement proteome, as well as multiple lysyl oxidases and peroxidases, establish homology with silk-associated materials such as fibroin, silk gum sericin, and pyriform spidroins from spider silk. Distinct primary structures defined by homologous domains shed light on how barnacles use low complexity in nanofibers to enable adhesion, and serves as a starting point for unraveling the molecular architecture of a robust and unique class of adhesive nanostructures.

  4. Evaluating the efficacy of a structure-derived amino acid substitution matrix in detecting protein homologs by BLAST and PSI-BLAST.

    PubMed

    Goonesekere, Nalin Cw

    2009-01-01

    The large numbers of protein sequences generated by whole genome sequencing projects require rapid and accurate methods of annotation. The detection of homology through computational sequence analysis is a powerful tool in determining the complex evolutionary and functional relationships that exist between proteins. Homology search algorithms employ amino acid substitution matrices to detect similarity between proteins sequences. The substitution matrices in common use today are constructed using sequences aligned without reference to protein structure. Here we present amino acid substitution matrices constructed from the alignment of a large number of protein domain structures from the structural classification of proteins (SCOP) database. We show that when incorporated into the homology search algorithms BLAST and PSI-blast, the structure-based substitution matrices enhance the efficacy of detecting remote homologs.

  5. Conservation of coevolving protein interfaces bridges prokaryote–eukaryote homologies in the twilight zone

    PubMed Central

    Rodriguez-Rivas, Juan; Marsili, Simone; Juan, David; Valencia, Alfonso

    2016-01-01

    Protein–protein interactions are fundamental for the proper functioning of the cell. As a result, protein interaction surfaces are subject to strong evolutionary constraints. Recent developments have shown that residue coevolution provides accurate predictions of heterodimeric protein interfaces from sequence information. So far these approaches have been limited to the analysis of families of prokaryotic complexes for which large multiple sequence alignments of homologous sequences can be compiled. We explore the hypothesis that coevolution points to structurally conserved contacts at protein–protein interfaces, which can be reliably projected to homologous complexes with distantly related sequences. We introduce a domain-centered protocol to study the interplay between residue coevolution and structural conservation of protein–protein interfaces. We show that sequence-based coevolutionary analysis systematically identifies residue contacts at prokaryotic interfaces that are structurally conserved at the interface of their eukaryotic counterparts. In turn, this allows the prediction of conserved contacts at eukaryotic protein–protein interfaces with high confidence using solely mutational patterns extracted from prokaryotic genomes. Even in the context of high divergence in sequence (the twilight zone), where standard homology modeling of protein complexes is unreliable, our approach provides sequence-based accurate information about specific details of protein interactions at the residue level. Selected examples of the application of prokaryotic coevolutionary analysis to the prediction of eukaryotic interfaces further illustrate the potential of this approach. PMID:27965389

  6. Conservation of coevolving protein interfaces bridges prokaryote-eukaryote homologies in the twilight zone.

    PubMed

    Rodriguez-Rivas, Juan; Marsili, Simone; Juan, David; Valencia, Alfonso

    2016-12-27

    Protein-protein interactions are fundamental for the proper functioning of the cell. As a result, protein interaction surfaces are subject to strong evolutionary constraints. Recent developments have shown that residue coevolution provides accurate predictions of heterodimeric protein interfaces from sequence information. So far these approaches have been limited to the analysis of families of prokaryotic complexes for which large multiple sequence alignments of homologous sequences can be compiled. We explore the hypothesis that coevolution points to structurally conserved contacts at protein-protein interfaces, which can be reliably projected to homologous complexes with distantly related sequences. We introduce a domain-centered protocol to study the interplay between residue coevolution and structural conservation of protein-protein interfaces. We show that sequence-based coevolutionary analysis systematically identifies residue contacts at prokaryotic interfaces that are structurally conserved at the interface of their eukaryotic counterparts. In turn, this allows the prediction of conserved contacts at eukaryotic protein-protein interfaces with high confidence using solely mutational patterns extracted from prokaryotic genomes. Even in the context of high divergence in sequence (the twilight zone), where standard homology modeling of protein complexes is unreliable, our approach provides sequence-based accurate information about specific details of protein interactions at the residue level. Selected examples of the application of prokaryotic coevolutionary analysis to the prediction of eukaryotic interfaces further illustrate the potential of this approach.

  7. HomPPI: a class of sequence homology based protein-protein interface prediction methods

    PubMed Central

    2011-01-01

    Background Although homology-based methods are among the most widely used methods for predicting the structure and function of proteins, the question as to whether interface sequence conservation can be effectively exploited in predicting protein-protein interfaces has been a subject of debate. Results We studied more than 300,000 pair-wise alignments of protein sequences from structurally characterized protein complexes, including both obligate and transient complexes. We identified sequence similarity criteria required for accurate homology-based inference of interface residues in a query protein sequence. Based on these analyses, we developed HomPPI, a class of sequence homology-based methods for predicting protein-protein interface residues. We present two variants of HomPPI: (i) NPS-HomPPI (Non partner-specific HomPPI), which can be used to predict interface residues of a query protein in the absence of knowledge of the interaction partner; and (ii) PS-HomPPI (Partner-specific HomPPI), which can be used to predict the interface residues of a query protein with a specific target protein. Our experiments on a benchmark dataset of obligate homodimeric complexes show that NPS-HomPPI can reliably predict protein-protein interface residues in a given protein, with an average correlation coefficient (CC) of 0.76, sensitivity of 0.83, and specificity of 0.78, when sequence homologs of the query protein can be reliably identified. NPS-HomPPI also reliably predicts the interface residues of intrinsically disordered proteins. Our experiments suggest that NPS-HomPPI is competitive with several state-of-the-art interface prediction servers including those that exploit the structure of the query proteins. The partner-specific classifier, PS-HomPPI can, on a large dataset of transient complexes, predict the interface residues of a query protein with a specific target, with a CC of 0.65, sensitivity of 0.69, and specificity of 0.70, when homologs of both the query and the target can be reliably identified. The HomPPI web server is available at http://homppi.cs.iastate.edu/. Conclusions Sequence homology-based methods offer a class of computationally efficient and reliable approaches for predicting the protein-protein interface residues that participate in either obligate or transient interactions. For query proteins involved in transient interactions, the reliability of interface residue prediction can be improved by exploiting knowledge of putative interaction partners. PMID:21682895

  8. Amino acid sequence of the human fibronectin receptor

    PubMed Central

    1987-01-01

    The amino acid sequence deduced from cDNA of the human placental fibronectin receptor is reported. The receptor is composed of two subunits: an alpha subunit of 1,008 amino acids which is processed into two polypeptides disulfide bonded to one another, and a beta subunit of 778 amino acids. Each subunit has near its COOH terminus a hydrophobic segment. This and other sequence features suggest a structure for the receptor in which the hydrophobic segments serve as transmembrane domains anchoring each subunit to the membrane and dividing each into a large ectodomain and a short cytoplasmic domain. The alpha subunit ectodomain has five sequence elements homologous to consensus Ca2+- binding sites of several calcium-binding proteins, and the beta subunit contains a fourfold repeat strikingly rich in cysteine. The alpha subunit sequence is 46% homologous to the alpha subunit of the vitronectin receptor. The beta subunit is 44% homologous to the human platelet adhesion receptor subunit IIIa and 47% homologous to a leukocyte adhesion receptor beta subunit. The high degree of homology (85%) of the beta subunit with one of the polypeptides of a chicken adhesion receptor complex referred to as integrin complex strongly suggests that the latter polypeptide is the chicken homologue of the fibronectin receptor beta subunit. These receptor subunit homologies define a superfamily of adhesion receptors. The availability of the entire protein sequence for the fibronectin receptor will facilitate studies on the functions of these receptors. PMID:2958481

  9. Mitochondrial Genome Sequence of the Legume Vicia faba

    PubMed Central

    Negruk, Valentine

    2013-01-01

    The number of plant mitochondrial genomes sequenced exceeds two dozen. However, for a detailed comparative study of different phylogenetic branches more plant mitochondrial genomes should be sequenced. This article presents sequencing data and comparative analysis of mitochondrial DNA (mtDNA) of the legume Vicia faba. The size of the V. faba circular mitochondrial master chromosome of cultivar Broad Windsor was estimated as 588,000 bp with a genome complexity of 387,745 bp and 52 conservative mitochondrial genes; 32 of them encoding proteins, 3 rRNA, and 17 tRNA genes. Six tRNA genes were highly homologous to chloroplast genome sequences. In addition to the 52 conservative genes, 114 unique open reading frames (ORFs) were found, 36 without significant homology to any known proteins and 29 with homology to the Medicago truncatula nuclear genome and to other plant mitochondrial ORFs, 49 ORFs were not homologous to M. truncatula but possessed sequences with significant homology to other plant mitochondrial or nuclear ORFs. In general, the unique ORFs revealed very low homology to known closely related legumes, but several sequence homologies were found between V. faba, Beta vulgaris, Nicotiana tabacum, Vitis vinifera, and even the monocots Oryza sativa and Zea mays. Most likely these ORFs arose independently during angiosperm evolution (Kubo and Mikami, 2007; Kubo and Newton, 2008). Computational analysis revealed in total about 45% of V. faba mtDNA sequence being homologous to the Medicago truncatula nuclear genome (more than to any sequenced plant mitochondrial genome), and 35% of this homology ranging from a few dozen to 12,806 bp are located on chromosome 1. Apparently, mitochondrial rrn5, rrn18, rps10, ATP synthase subunit alpha, cox2, and tRNA sequences are part of transcribed nuclear mosaic ORFs. PMID:23675376

  10. Evidence for the presence of key chlorophyll-biosynthesis-related proteins in the genus Rubrobacter (Phylum Actinobacteria) and its implications for the evolution and origin of photosynthesis.

    PubMed

    Gupta, Radhey S; Khadka, Bijendra

    2016-02-01

    Homologs showing high degree of sequence similarity to the three subunits of the protochlorophyllide oxidoreductase enzyme complex (viz. BchL, BchN, and BchB), which carries out a central role in chlorophyll-bacteriochlorophyll (Bchl) biosynthesis, are uniquely found in photosynthetic organisms. The results of BLAST searches and homology modeling presented here show that proteins exhibiting a high degree of sequence and structural similarity to the BchB and BchN proteins are also present in organisms from the high G+C Gram-positive phylum of Actinobacteria, specifically in members of the genus Rubrobacter (R. x ylanophilus and R. r adiotolerans). The results presented exclude the possibility that the observed BLAST hits are for subunits of the nitrogenase complex or the chlorin reductase complex. The branching in phylogenetic trees and the sequence characteristics of the Rubrobacter BchB/BchN homologs indicate that these homologs are distinct from those found in other photosynthetic bacteria and that they may represent ancestral forms of the BchB/BchN proteins. Although a homolog showing high degree of sequence similarity to the BchL protein was not detected in Rubrobacter, another protein, belonging to the ParA/Soj/MinD family, present in these bacteria, exhibits high degree of structural similarity to the BchL. In addition to the BchB/BchN homologs, Rubrobacter species also contain homologs showing high degree of sequence similarity to different subunits of magnesium chelatase (BchD, BchH, and BchI) as well as proteins showing significant similarity to the BchP and BchG proteins. Interestingly, no homologs corresponding to the BchX, BchY, and BchZ proteins were detected in the Rubrobacter species. These results provide the first suggestive evidence that some form of photosynthesis either exists or was anciently present within the phylum Actinobacteria (high G+C Gram-positive) in members of the genus Rubrobacter. The significance of these results concerning the origin of the Bchl-based photosynthesis is also discussed.

  11. Reassociation and hybridization properties of DNAs from several species of fish

    USGS Publications Warehouse

    Gharrett, A.J.; Simon, R.C.; McIntyre, J.D.

    1977-01-01

    Reassociation and hybridization properties from spectrophotometric studies of DNAs from 10 species of fish indicate:1. Great diversity in the amounts of repeated sequences in the genomes of different species - more specialized fish had less redundancy.2. Large differences in the complexities of the DNAs - more specialized fish had less information.3. Little homology between sequences of remotely related species but substantial homology between sequences of closely related species.

  12. Caught in the act: the lifetime of synaptic intermediates during the search for homology on DNA

    PubMed Central

    Mani, Adam; Braslavsky, Ido; Arbel-Goren, Rinat; Stavans, Joel

    2010-01-01

    Homologous recombination plays pivotal roles in DNA repair and in the generation of genetic diversity. To locate homologous target sequences at which strand exchange can occur within a timescale that a cell’s biology demands, a single-stranded DNA-recombinase complex must search among a large number of sequences on a genome by forming synapses with chromosomal segments of DNA. A key element in the search is the time it takes for the two sequences of DNA to be compared, i.e. the synapse lifetime. Here, we visualize for the first time fluorescently tagged individual synapses formed by RecA, a prokaryotic recombinase, and measure their lifetime as a function of synapse length and differences in sequence between the participating DNAs. Surprisingly, lifetimes can be ∼10 s long when the DNAs are fully heterologous, and much longer for partial homology, consistently with ensemble FRET measurements. Synapse lifetime increases rapidly as the length of a region of full homology at either the 3′- or 5′-ends of the invading single-stranded DNA increases above 30 bases. A few mismatches can reduce dramatically the lifetime of synapses formed with nearly homologous DNAs. These results suggest the need for facilitated homology search mechanisms to locate homology successfully within the timescales observed in vivo. PMID:20044347

  13. DNA sequence alignment by microhomology sampling during homologous recombination

    PubMed Central

    Qi, Zhi; Redding, Sy; Lee, Ja Yil; Gibb, Bryan; Kwon, YoungHo; Niu, Hengyao; Gaines, William A.; Sung, Patrick

    2015-01-01

    Summary Homologous recombination (HR) mediates the exchange of genetic information between sister or homologous chromatids. During HR, members of the RecA/Rad51 family of recombinases must somehow search through vast quantities of DNA sequence to align and pair ssDNA with a homologous dsDNA template. Here we use single-molecule imaging to visualize Rad51 as it aligns and pairs homologous DNA sequences in real-time. We show that Rad51 uses a length-based recognition mechanism while interrogating dsDNA, enabling robust kinetic selection of 8-nucleotide (nt) tracts of microhomology, which kinetically confines the search to sites with a high probability of being a homologous target. Successful pairing with a 9th nucleotide coincides with an additional reduction in binding free energy and subsequent strand exchange occurs in precise 3-nt steps, reflecting the base triplet organization of the presynaptic complex. These findings provide crucial new insights into the physical and evolutionary underpinnings of DNA recombination. PMID:25684365

  14. Resolution of model Holliday junctions by yeast endonuclease: effect of DNA structure and sequence.

    PubMed Central

    Parsons, C A; Murchie, A I; Lilley, D M; West, S C

    1989-01-01

    The resolution of Holliday junctions in DNA involves specific cleavage at or close to the site of the junction. A nuclease from Saccharomyces cerevisiae cleaves model Holliday junctions in vitro by the introduction of nicks in regions of duplex DNA adjacent to the crossover point. In previous studies [Parsons and West (1988) Cell, 52, 621-629] it was shown that cleavage occurred within homologous arm sequences with precise symmetry across the junction. In contrast, junctions with heterologous arm sequences were cleaved asymmetrically. In this work, we have studied the effect of sequence changes and base modification upon the site of cleavage. It is shown that the specificity of cleavage is unchanged providing that perfect homology is maintained between opposing arm sequences. However, in the absence of homology, cleavage depends upon sequence context and is affected by minor changes such as base modification. These data support the proposed mechanism for cleavage of a Holliday junction, which requires homologous alignment of arm sequences in an enzyme--DNA complex as a prerequisite for symmetrical cleavage by the yeast endonuclease. Images PMID:2653810

  15. Myocilin, a Component of a Membrane-Associated Protein Complex Driven by a Homologous Q-SNARE Domain

    PubMed Central

    Dismuke, W. Michael; McKay, Brian S.; Stamer, W. Daniel

    2012-01-01

    Myocilin is a widely expressed protein with no known function, however, mutations in myocilin appear to manifest uniquely as ocular hypertension and the blinding disease glaucoma. Using the protein homology/analogy recognition engine (PHYRE) we find that the olfactomedin domain of myocilin is similar in sequence motif and structure to a six-bladed, kelch repeat motif based on the known crystal structures of such proteins. Additionally, using sequence analysis we identify a coiled-coil segment of myocilin with homology to human Q-SNARE proteins. Using COS-7 cells expressing full length human myocilin and a version lacking the C-terminal olfactomedin domain, we identified a membrane-associated protein complex containing myocilin by hydrodynamic analysis. The myocilin construct that included the coiled-coil but lacked the olfactomedin domain formed complexes similar to the full-length protein, indicating that the coiled-coil domain of myocilin is sufficient for myocilin to bind to the large detergent resistant complex. In human retina and retinal pigment epithelium, which express myocilin, we detected the protein in a large, SDS-resistant, membrane-associated complex. We characterized the hydrodynamic properties of myocilin in human tissues as either a 15s complex with an Mr=405,000–440,000 yielding a slightly elongated globular shape similar to known SNARE complexes or a dimer of 6.4s and Mr=108,000. By identifying the Q-SNARE homology within the second coil of myocilin and documenting its participation in a SNARE-like complex, we provide evidence of a SNARE domain containing protein associated with a human disease. PMID:22463803

  16. Escherichia coli promoter sequences predict in vitro RNA polymerase selectivity.

    PubMed

    Mulligan, M E; Hawley, D K; Entriken, R; McClure, W R

    1984-01-11

    We describe a simple algorithm for computing a homology score for Escherichia coli promoters based on DNA sequence alone. The homology score was related to 31 values, measured in vitro, of RNA polymerase selectivity, which we define as the product KBk2, the apparent second order rate constant for open complex formation. We found that promoter strength could be predicted to within a factor of +/-4.1 in KBk2 over a range of 10(4) in the same parameter. The quantitative evaluation was linked to an automated (Apple II) procedure for searching and evaluating possible promoters in DNA sequence files.

  17. On the necessity of dissecting sequence similarity scores into segment-specific contributions for inferring protein homology, function prediction and annotation

    PubMed Central

    2014-01-01

    Background Protein sequence similarities to any types of non-globular segments (coiled coils, low complexity regions, transmembrane regions, long loops, etc. where either positional sequence conservation is the result of a very simple, physically induced pattern or rather integral sequence properties are critical) are pertinent sources for mistaken homologies. Regretfully, these considerations regularly escape attention in large-scale annotation studies since, often, there is no substitute to manual handling of these cases. Quantitative criteria are required to suppress events of function annotation transfer as a result of false homology assignments. Results The sequence homology concept is based on the similarity comparison between the structural elements, the basic building blocks for conferring the overall fold of a protein. We propose to dissect the total similarity score into fold-critical and other, remaining contributions and suggest that, for a valid homology statement, the fold-relevant score contribution should at least be significant on its own. As part of the article, we provide the DissectHMMER software program for dissecting HMMER2/3 scores into segment-specific contributions. We show that DissectHMMER reproduces HMMER2/3 scores with sufficient accuracy and that it is useful in automated decisions about homology for instructive sequence examples. To generalize the dissection concept for cases without 3D structural information, we find that a dissection based on alignment quality is an appropriate surrogate. The approach was applied to a large-scale study of SMART and PFAM domains in the space of seed sequences and in the space of UniProt/SwissProt. Conclusions Sequence similarity core dissection with regard to fold-critical and other contributions systematically suppresses false hits and, additionally, recovers previously obscured homology relationships such as the one between aquaporins and formate/nitrite transporters that, so far, was only supported by structure comparison. PMID:24890864

  18. A benchmark testing ground for integrating homology modeling and protein docking.

    PubMed

    Bohnuud, Tanggis; Luo, Lingqi; Wodak, Shoshana J; Bonvin, Alexandre M J J; Weng, Zhiping; Vajda, Sandor; Schueler-Furman, Ora; Kozakov, Dima

    2017-01-01

    Protein docking procedures carry out the task of predicting the structure of a protein-protein complex starting from the known structures of the individual protein components. More often than not, however, the structure of one or both components is not known, but can be derived by homology modeling on the basis of known structures of related proteins deposited in the Protein Data Bank (PDB). Thus, the problem is to develop methods that optimally integrate homology modeling and docking with the goal of predicting the structure of a complex directly from the amino acid sequences of its component proteins. One possibility is to use the best available homology modeling and docking methods. However, the models built for the individual subunits often differ to a significant degree from the bound conformation in the complex, often much more so than the differences observed between free and bound structures of the same protein, and therefore additional conformational adjustments, both at the backbone and side chain levels need to be modeled to achieve an accurate docking prediction. In particular, even homology models of overall good accuracy frequently include localized errors that unfavorably impact docking results. The predicted reliability of the different regions in the model can also serve as a useful input for the docking calculations. Here we present a benchmark dataset that should help to explore and solve combined modeling and docking problems. This dataset comprises a subset of the experimentally solved 'target' complexes from the widely used Docking Benchmark from the Weng Lab (excluding antibody-antigen complexes). This subset is extended to include the structures from the PDB related to those of the individual components of each complex, and hence represent potential templates for investigating and benchmarking integrated homology modeling and docking approaches. Template sets can be dynamically customized by specifying ranges in sequence similarity and in PDB release dates, or using other filtering options, such as excluding sets of specific structures from the template list. Multiple sequence alignments, as well as structural alignments of the templates to their corresponding subunits in the target are also provided. The resource is accessible online or can be downloaded at http://cluspro.org/benchmark, and is updated on a weekly basis in synchrony with new PDB releases. Proteins 2016; 85:10-16. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  19. Template-based structure modeling of protein-protein interactions

    PubMed Central

    Szilagyi, Andras; Zhang, Yang

    2014-01-01

    The structure of protein-protein complexes can be constructed by using the known structure of other protein complexes as a template. The complex structure templates are generally detected either by homology-based sequence alignments or, given the structure of monomer components, by structure-based comparisons. Critical improvements have been made in recent years by utilizing interface recognition and by recombining monomer and complex template libraries. Encouraging progress has also been witnessed in genome-wide applications of template-based modeling, with modeling accuracy comparable to high-throughput experimental data. Nevertheless, bottlenecks exist due to the incompleteness of the proteinprotein complex structure library and the lack of methods for distant homologous template identification and full-length complex structure refinement. PMID:24721449

  20. Novel species including Mycobacterium fukienense sp. is found from tuberculosis patients in Fujian Province, China, using phylogenetic analysis of Mycobacterium chelonae/abscessus complex.

    PubMed

    Zhang, Yuan Yuan; Li, Yan Bing; Huang, Ming Xiang; Zhao, Xiu Qin; Zhang, Li Shui; Liu, Wen En; Wan, Kang Lin

    2013-11-01

    To identify the novel species 'Mycobacterium fukienense' sp. nov of Mycobacterium chelonae/abscessus complex from tuberculosis patients in Fujian Province, China. Five of 27 clinical Mycobacterium isolates (Cls) were previously identified as M. chelonae/abscessus complex by sequencing the hsp65, rpoB, 16S-23S rRNA internal transcribed spacer region (its), recA and sodA house-keeping genes commonly used to describe the molecular characteristics of Mycobacterium. Clinical Mycobacterium isolates were classified according to the gene sequence using a clustering analysis program. Sequence similarity within clusters and diversity between clusters were analyzed. The 5 isolates were identified with distinct sequences exhibiting 99.8% homology in the hsp65 gene. However, a complete lack of homology was observed among the sequences of the rpoB, 16S-23S rRNA internal transcribed spacer region (its), sodA, and recA genes as compared with the M. abscessus. Furthermore, no match for rpoB, sodA, and recA genes was identified among the published sequences. The novel species, Mycobacterium fukienense, is identified from tuberculosis patients in Fujian Province, China, which does not belong to any existing subspecies of M. chelonea/abscessus complex. Copyright © 2013 The Editorial Board of Biomedical and Environmental Sciences. Published by China CDC. All rights reserved.

  1. Biological intuition in alignment-free methods: response to Posada.

    PubMed

    Ragan, Mark A; Chan, Cheong Xin

    2013-08-01

    A recent editorial in Journal of Molecular Evolution highlights opportunities and challenges facing molecular evolution in the era of next-generation sequencing. Abundant sequence data should allow more-complex models to be fit at higher confidence, making phylogenetic inference more reliable and improving our understanding of evolution at the molecular level. However, concern that approaches based on multiple sequence alignment may be computationally infeasible for large datasets is driving the development of so-called alignment-free methods for sequence comparison and phylogenetic inference. The recent editorial characterized these approaches as model-free, not based on the concept of homology, and lacking in biological intuition. We argue here that alignment-free methods have not abandoned models or homology, and can be biologically intuitive.

  2. Prefiltering Model for Homology Detection Algorithms on GPU.

    PubMed

    Retamosa, Germán; de Pedro, Luis; González, Ivan; Tamames, Javier

    2016-01-01

    Homology detection has evolved over the time from heavy algorithms based on dynamic programming approaches to lightweight alternatives based on different heuristic models. However, the main problem with these algorithms is that they use complex statistical models, which makes it difficult to achieve a relevant speedup and find exact matches with the original results. Thus, their acceleration is essential. The aim of this article was to prefilter a sequence database. To make this work, we have implemented a groundbreaking heuristic model based on NVIDIA's graphics processing units (GPUs) and multicore processors. Depending on the sensitivity settings, this makes it possible to quickly reduce the sequence database by factors between 50% and 95%, while rejecting no significant sequences. Furthermore, this prefiltering application can be used together with multiple homology detection algorithms as a part of a next-generation sequencing system. Extensive performance and accuracy tests have been carried out in the Spanish National Centre for Biotechnology (NCB). The results show that GPU hardware can accelerate the execution times of former homology detection applications, such as National Centre for Biotechnology Information (NCBI), Basic Local Alignment Search Tool for Proteins (BLASTP), up to a factor of 4.

  3. PSI/TM-Coffee: a web server for fast and accurate multiple sequence alignments of regular and transmembrane proteins using homology extension on reduced databases.

    PubMed

    Floden, Evan W; Tommaso, Paolo D; Chatzou, Maria; Magis, Cedrik; Notredame, Cedric; Chang, Jia-Ming

    2016-07-08

    The PSI/TM-Coffee web server performs multiple sequence alignment (MSA) of proteins by combining homology extension with a consistency based alignment approach. Homology extension is performed with Position Specific Iterative (PSI) BLAST searches against a choice of redundant and non-redundant databases. The main novelty of this server is to allow databases of reduced complexity to rapidly perform homology extension. This server also gives the possibility to use transmembrane proteins (TMPs) reference databases to allow even faster homology extension on this important category of proteins. Aside from an MSA, the server also outputs topological prediction of TMPs using the HMMTOP algorithm. Previous benchmarking of the method has shown this approach outperforms the most accurate alignment methods such as MSAProbs, Kalign, PROMALS, MAFFT, ProbCons and PRALINE™. The web server is available at http://tcoffee.crg.cat/tmcoffee. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  4. SINE sequences detect DNA fingerprints in salmonid fishes.

    PubMed

    Spruell, P; Thorgaard, G H

    1996-04-01

    DNA probes homologous to two previously described salmonid short interspersed nuclear elements (SINEs) detected DNA fingerprint patterns in 14 species of salmonid fishes. The probes showed more homology to some species than to others and little homology to three nonsalmonid fishes. The DNA fingerprint patterns derived from the SINE probes are individual-specific and inherited in a Mendelian manner. Probes derived from different regions of the same SINE detect only partially overlapping banding patterns, reflecting a more complex SINE structure than has been previously reported. Like the human Alu sequence, the SINEs found in salmonids could provide useful genetic markers and primer sites for PCR-based techniques. These elements may be more desirable for some applications than traditional DNA fingerprinting probes that detect tandemly repeated arrays.

  5. Coupling unbiased mutagenesis to high-throughput DNA sequencing uncovers functional domains in the Ndc80 kinetochore protein of Saccharomyces cerevisiae.

    PubMed

    Tien, Jerry F; Fong, Kimberly K; Umbreit, Neil T; Payen, Celia; Zelter, Alex; Asbury, Charles L; Dunham, Maitreya J; Davis, Trisha N

    2013-09-01

    During mitosis, kinetochores physically link chromosomes to the dynamic ends of spindle microtubules. This linkage depends on the Ndc80 complex, a conserved and essential microtubule-binding component of the kinetochore. As a member of the complex, the Ndc80 protein forms microtubule attachments through a calponin homology domain. Ndc80 is also required for recruiting other components to the kinetochore and responding to mitotic regulatory signals. While the calponin homology domain has been the focus of biochemical and structural characterization, the function of the remainder of Ndc80 is poorly understood. Here, we utilized a new approach that couples high-throughput sequencing to a saturating linker-scanning mutagenesis screen in Saccharomyces cerevisiae. We identified domains in previously uncharacterized regions of Ndc80 that are essential for its function in vivo. We show that a helical hairpin adjacent to the calponin homology domain influences microtubule binding by the complex. Furthermore, a mutation in this hairpin abolishes the ability of the Dam1 complex to strengthen microtubule attachments made by the Ndc80 complex. Finally, we defined a C-terminal segment of Ndc80 required for tetramerization of the Ndc80 complex in vivo. This unbiased mutagenesis approach can be generally applied to genes in S. cerevisiae to identify functional properties and domains.

  6. ComplexContact: a web server for inter-protein contact prediction using deep learning.

    PubMed

    Zeng, Hong; Wang, Sheng; Zhou, Tianming; Zhao, Feifeng; Li, Xiufeng; Wu, Qing; Xu, Jinbo

    2018-05-22

    ComplexContact (http://raptorx2.uchicago.edu/ComplexContact/) is a web server for sequence-based interfacial residue-residue contact prediction of a putative protein complex. Interfacial residue-residue contacts are critical for understanding how proteins form complex and interact at residue level. When receiving a pair of protein sequences, ComplexContact first searches for their sequence homologs and builds two paired multiple sequence alignments (MSA), then it applies co-evolution analysis and a CASP-winning deep learning (DL) method to predict interfacial contacts from paired MSAs and visualizes the prediction as an image. The DL method was originally developed for intra-protein contact prediction and performed the best in CASP12. Our large-scale experimental test further shows that ComplexContact greatly outperforms pure co-evolution methods for inter-protein contact prediction, regardless of the species.

  7. Origin and spread of photosynthesis based upon conserved sequence features in key bacteriochlorophyll biosynthesis proteins.

    PubMed

    Gupta, Radhey S

    2012-11-01

    The origin of photosynthesis and how this capability has spread to other bacterial phyla remain important unresolved questions. I describe here a number of conserved signature indels (CSIs) in key proteins involved in bacteriochlorophyll (Bchl) biosynthesis that provide important insights in these regards. The proteins BchL and BchX, which are essential for Bchl biosynthesis, are derived by gene duplication in a common ancestor of all phototrophs. More ancient gene duplication gave rise to the BchX-BchL proteins and the NifH protein of the nitrogenase complex. The sequence alignment of NifH-BchX-BchL proteins contain two CSIs that are uniquely shared by all NifH and BchX homologs, but not by any BchL homologs. These CSIs and phylogenetic analysis of NifH-BchX-BchL protein sequences strongly suggest that the BchX homologs are ancestral to BchL and that the Bchl-based anoxygenic photosynthesis originated prior to the chlorophyll (Chl)-based photosynthesis in cyanobacteria. Another CSI in the BchX-BchL sequence alignment that is uniquely shared by all BchX homologs and the BchL sequences from Heliobacteriaceae, but absent in all other BchL homologs, suggests that the BchL homologs from Heliobacteriaceae are primitive in comparison to all other photosynthetic lineages. Several other identified CSIs in the BchN homologs are commonly shared by all proteobacterial homologs and a clade consisting of the marine unicellular Cyanobacteria (Clade C). These CSIs in conjunction with the results of phylogenetic analyses and pair-wise sequence similarity on the BchL, BchN, and BchB proteins, where the homologs from Clade C Cyanobacteria and Proteobacteria exhibited close relationship, provide strong evidence that these two groups have incurred lateral gene transfers. Additionally, phylogenetic analyses and several CSIs in the BchL-N-B proteins that are uniquely shared by all Chlorobi and Chloroflexi homologs provide evidence that the genes for these proteins have also been laterally transferred between these groups. Other results and observations reported here indicate that the genes for the BchL-N-B proteins in Proteobacteria are derived from the Clade C Cyanobacteria, whereas those in Chlorobi were acquired from Chloroflexus or related bacteria by means of LGTs. Some implications of these observations regarding the origin and spread of photosynthesis are discussed.

  8. Trans-Homolog Interactions Facilitating Paramutation in Maize

    PubMed Central

    2015-01-01

    Paramutations represent locus-specific trans-homolog interactions affecting the heritable silencing properties of endogenous alleles. Although examples of paramutation are well studied in maize (Zea mays), the responsible mechanisms remain unclear. Genetic analyses indicate roles for plant-specific DNA-dependent RNA polymerases that generate small RNAs, and current working models hypothesize that these small RNAs direct heritable changes at sequences often acting as transcriptional enhancers. Several studies have defined specific sequences that mediate paramutation behaviors, and recent results identify a diversity of DNA-dependent RNA polymerase complexes operating in maize. Other reports ascribe broader roles for some of these complexes in normal genome function. This review highlights recent research to understand the molecular mechanisms of paramutation and examines evidence relevant to small RNA-based modes of transgenerational epigenetic inheritance. PMID:26149572

  9. Metagenomic gene annotation by a homology-independent approach

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Froula, Jeff; Zhang, Tao; Salmeen, Annette

    2011-06-02

    Fully understanding the genetic potential of a microbial community requires functional annotation of all the genes it encodes. The recently developed deep metagenome sequencing approach has enabled rapid identification of millions of genes from a complex microbial community without cultivation. Current homology-based gene annotation fails to detect distantly-related or structural homologs. Furthermore, homology searches with millions of genes are very computational intensive. To overcome these limitations, we developed rhModeller, a homology-independent software pipeline to efficiently annotate genes from metagenomic sequencing projects. Using cellulases and carbonic anhydrases as two independent test cases, we demonstrated that rhModeller is much faster than HMMERmore » but with comparable accuracy, at 94.5percent and 99.9percent accuracy, respectively. More importantly, rhModeller has the ability to detect novel proteins that do not share significant homology to any known protein families. As {approx}50percent of the 2 million genes derived from the cow rumen metagenome failed to be annotated based on sequence homology, we tested whether rhModeller could be used to annotate these genes. Preliminary results suggest that rhModeller is robust in the presence of missense and frameshift mutations, two common errors in metagenomic genes. Applying the pipeline to the cow rumen genes identified 4,990 novel cellulases candidates and 8,196 novel carbonic anhydrase candidates.In summary, we expect rhModeller to dramatically increase the speed and quality of metagnomic gene annotation.« less

  10. Sequence homology between HLA-bound cytomegalovirus and human peptides: A potential trigger for alloreactivity

    PubMed Central

    Koparde, Vishal N.; Jameson-Lee, Maximilian; Elnasseh, Abdelrhman G.; Scalora, Allison F.; Kobulnicky, David J.; Serrano, Myrna G.; Roberts, Catherine H.; Buck, Gregory A.; Neale, Michael C.; Nixon, Daniel E.; Toor, Amir A.

    2017-01-01

    Human cytomegalovirus (hCMV) reactivation may often coincide with the development of graft-versus-host-disease (GVHD) in stem cell transplantation (SCT). Seventy seven SCT donor-recipient pairs (DRP) (HLA matched unrelated donor (MUD), n = 50; matched related donor (MRD), n = 27) underwent whole exome sequencing to identify single nucleotide polymorphisms (SNPs) generating alloreactive peptide libraries for each DRP (9-mer peptide-HLA complexes); Human CMV CROSS (Cross-Reactive Open Source Sequence) database was compiled from NCBI; HLA class I binding affinity for each DRPs HLA was calculated by NetMHCpan 2.8 and hCMV- derived 9-mers algorithmically compared to the alloreactive peptide-HLA complex libraries. Short consecutive (≥6) amino acid (AA) sequence homology matching hCMV to recipient peptides was considered for HLA-bound-peptide (IC50<500nM) cross reactivity. Of the 70,686 hCMV 9-mers contained within the hCMV CROSS database, an average of 29,658 matched the MRD DRP alloreactive peptides and 52,910 matched MUD DRP peptides (p<0.001). In silico analysis revealed multiple high affinity, immunogenic CMV-Human peptide matches (IC50<500 nM) expressed in GVHD-affected tissue-specific manner. hCMV+GVHD was found in 18 patients, 13 developing hCMV viremia before GVHD onset. Analysis of patients with GVHD identified potential cross reactive peptide expression within affected organs. We propose that hCMV peptide sequence homology with human alloreactive peptides may contribute to the pathophysiology of GVHD. PMID:28800601

  11. Sequence homology between HLA-bound cytomegalovirus and human peptides: A potential trigger for alloreactivity.

    PubMed

    Hall, Charles E; Koparde, Vishal N; Jameson-Lee, Maximilian; Elnasseh, Abdelrhman G; Scalora, Allison F; Kobulnicky, David J; Serrano, Myrna G; Roberts, Catherine H; Buck, Gregory A; Neale, Michael C; Nixon, Daniel E; Toor, Amir A

    2017-01-01

    Human cytomegalovirus (hCMV) reactivation may often coincide with the development of graft-versus-host-disease (GVHD) in stem cell transplantation (SCT). Seventy seven SCT donor-recipient pairs (DRP) (HLA matched unrelated donor (MUD), n = 50; matched related donor (MRD), n = 27) underwent whole exome sequencing to identify single nucleotide polymorphisms (SNPs) generating alloreactive peptide libraries for each DRP (9-mer peptide-HLA complexes); Human CMV CROSS (Cross-Reactive Open Source Sequence) database was compiled from NCBI; HLA class I binding affinity for each DRPs HLA was calculated by NetMHCpan 2.8 and hCMV- derived 9-mers algorithmically compared to the alloreactive peptide-HLA complex libraries. Short consecutive (≥6) amino acid (AA) sequence homology matching hCMV to recipient peptides was considered for HLA-bound-peptide (IC50<500nM) cross reactivity. Of the 70,686 hCMV 9-mers contained within the hCMV CROSS database, an average of 29,658 matched the MRD DRP alloreactive peptides and 52,910 matched MUD DRP peptides (p<0.001). In silico analysis revealed multiple high affinity, immunogenic CMV-Human peptide matches (IC50<500 nM) expressed in GVHD-affected tissue-specific manner. hCMV+GVHD was found in 18 patients, 13 developing hCMV viremia before GVHD onset. Analysis of patients with GVHD identified potential cross reactive peptide expression within affected organs. We propose that hCMV peptide sequence homology with human alloreactive peptides may contribute to the pathophysiology of GVHD.

  12. Scalable Parallel Methods for Analyzing Metagenomics Data at Extreme Scale

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Daily, Jeffrey A.

    2015-05-01

    The field of bioinformatics and computational biology is currently experiencing a data revolution. The exciting prospect of making fundamental biological discoveries is fueling the rapid development and deployment of numerous cost-effective, high-throughput next-generation sequencing technologies. The result is that the DNA and protein sequence repositories are being bombarded with new sequence information. Databases are continuing to report a Moore’s law-like growth trajectory in their database sizes, roughly doubling every 18 months. In what seems to be a paradigm-shift, individual projects are now capable of generating billions of raw sequence data that need to be analyzed in the presence of alreadymore » annotated sequence information. While it is clear that data-driven methods, such as sequencing homology detection, are becoming the mainstay in the field of computational life sciences, the algorithmic advancements essential for implementing complex data analytics at scale have mostly lagged behind. Sequence homology detection is central to a number of bioinformatics applications including genome sequencing and protein family characterization. Given millions of sequences, the goal is to identify all pairs of sequences that are highly similar (or “homologous”) on the basis of alignment criteria. While there are optimal alignment algorithms to compute pairwise homology, their deployment for large-scale is currently not feasible; instead, heuristic methods are used at the expense of quality. In this dissertation, we present the design and evaluation of a parallel implementation for conducting optimal homology detection on distributed memory supercomputers. Our approach uses a combination of techniques from asynchronous load balancing (viz. work stealing, dynamic task counters), data replication, and exact-matching filters to achieve homology detection at scale. Results for a collection of 2.56M sequences show parallel efficiencies of ~75-100% on up to 8K cores, representing a time-to-solution of 33 seconds. We extend this work with a detailed analysis of single-node sequence alignment performance using the latest CPU vector instruction set extensions. Preliminary results reveal that current sequence alignment algorithms are unable to fully utilize widening vector registers.« less

  13. Structure and DNA-Binding Sites of the SWI1 AT-rich Interaction Domain (ARID) Suggest Determinants for Sequence-Specific DNA Recognition

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kim, Suhkmann; Zhang, Ziming; Upchurch, Sean

    2004-04-16

    2 ARID is a homologous family of DNA-binding domains that occur in DNA binding proteins from a wide variety of species, ranging from yeast to nematodes, insects, mammals and plants. SWI1, a member of the SWI/SNF protein complex that is involved in chromatin remodeling during transcription, contains the ARID motif. The ARID domain of human SWI1 (also known as p270) does not select for a specific DNA sequence from a random sequence pool. The lack of sequence specificity shown by the SWI1 ARID domain stands in contrast to the other characterized ARID domains, which recognize specific AT-rich sequences. We havemore » solved the three-dimensional structure of human SWI1 ARID using solution NMR methods. In addition, we have characterized non-specific DNA-binding by the SWI1 ARID domain. Results from this study indicate that a flexible long internal loop in ARID motif is likely to be important for sequence specific DNA-recognition. The structure of human SWI1 ARID domain also represents a distinct structural subfamily. Studies of ARID indicate that boundary of the DNA binding structural and functional domains can extend beyond the sequence homologous region in a homologous family of proteins. Structural studies of homologous domains such as ARID family of DNA-binding domains should provide information to better predict the boundary of structural and functional domains in structural genomic studies. Key Words: ARID, SWI1, NMR, structural genomics, protein-DNA interaction.« less

  14. Comparative structural analysis of Bru1 region homeologs in Saccharum spontaneum and S. officinarum

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Jisen; Sharma, Anupma; Yu, Qingyi

    Here, sugarcane is a major sugar and biofuel crop, but genomic research and molecular breeding have lagged behind other major crops due to the complexity of auto-allopolyploid genomes. Sugarcane cultivars are frequently aneuploid with chromosome number ranging from 100 to 130, consisting of 70-80 % S. officinarum, 10-20 % S. spontaneum, and 10 % recombinants between these two species. Analysis of a genomic region in the progenitor autoploid genomes of sugarcane hybrid cultivars will reveal the nature and divergence of homologous chromosomes. As a result, to investigate the origin and evolution of haplotypes in the Bru1 genomic regions in sugarcanemore » cultivars, we identified two BAC clones from S. spontaneum and four from S. officinarum and compared to seven haplotype sequences from sugarcane hybrid R570. The results clarified the origin of seven homologous haplotypes in R570, four haplotypes originated from S. officinarum, two from S. spontaneum and one recombinant.. Retrotransposon insertions and sequences variations among the homologous haplotypes sequence divergence ranged from 18.2 % to 60.5 % with an average of 33. 7 %. Gene content and gene structure were relatively well conserved among the homologous haplotypes. Exon splitting occurred in haplotypes of the hybrid genome but not in its progenitor genomes. Tajima's D analysis revealed that S. spontaneum hapotypes in the Bru1 genomic regions were under strong directional selection. Numerous inversions, deletions, insertions and translocations were found between haplotypes within each genome. In conclusion, this is the first comparison among haplotypes of a modern sugarcane hybrid and its two progenitors. Tajima's D results emphasized the crucial role of this fungal disease resistance gene for enhancing the fitness of this species and indicating that the brown rust resistance gene in R570 is from S. spontaneum. Species-specific InDel, sequences similarity and phylogenetic analysis of homologous genes can be used for identifying the origin of S. spontaneum and S. officinarum haplotype in Saccharum hybrids. Comparison of exon splitting among the homologous haplotypes suggested that the genome rearrangements in Saccharum hybrids S. officinarum would be sufficient for proper genome assembly of this autopolyploid genome. Retrotransposon insertions and sequences variations among the homologous haplotypes sequence divergence may allow sequencing and assembling the autopolyploid Saccharum genomes and the auto-allopolyploid hybrid genomes using whole genome shotgun sequencing.« less

  15. Comparative structural analysis of Bru1 region homeologs in Saccharum spontaneum and S. officinarum

    DOE PAGES

    Zhang, Jisen; Sharma, Anupma; Yu, Qingyi; ...

    2016-06-10

    Here, sugarcane is a major sugar and biofuel crop, but genomic research and molecular breeding have lagged behind other major crops due to the complexity of auto-allopolyploid genomes. Sugarcane cultivars are frequently aneuploid with chromosome number ranging from 100 to 130, consisting of 70-80 % S. officinarum, 10-20 % S. spontaneum, and 10 % recombinants between these two species. Analysis of a genomic region in the progenitor autoploid genomes of sugarcane hybrid cultivars will reveal the nature and divergence of homologous chromosomes. As a result, to investigate the origin and evolution of haplotypes in the Bru1 genomic regions in sugarcanemore » cultivars, we identified two BAC clones from S. spontaneum and four from S. officinarum and compared to seven haplotype sequences from sugarcane hybrid R570. The results clarified the origin of seven homologous haplotypes in R570, four haplotypes originated from S. officinarum, two from S. spontaneum and one recombinant.. Retrotransposon insertions and sequences variations among the homologous haplotypes sequence divergence ranged from 18.2 % to 60.5 % with an average of 33. 7 %. Gene content and gene structure were relatively well conserved among the homologous haplotypes. Exon splitting occurred in haplotypes of the hybrid genome but not in its progenitor genomes. Tajima's D analysis revealed that S. spontaneum hapotypes in the Bru1 genomic regions were under strong directional selection. Numerous inversions, deletions, insertions and translocations were found between haplotypes within each genome. In conclusion, this is the first comparison among haplotypes of a modern sugarcane hybrid and its two progenitors. Tajima's D results emphasized the crucial role of this fungal disease resistance gene for enhancing the fitness of this species and indicating that the brown rust resistance gene in R570 is from S. spontaneum. Species-specific InDel, sequences similarity and phylogenetic analysis of homologous genes can be used for identifying the origin of S. spontaneum and S. officinarum haplotype in Saccharum hybrids. Comparison of exon splitting among the homologous haplotypes suggested that the genome rearrangements in Saccharum hybrids S. officinarum would be sufficient for proper genome assembly of this autopolyploid genome. Retrotransposon insertions and sequences variations among the homologous haplotypes sequence divergence may allow sequencing and assembling the autopolyploid Saccharum genomes and the auto-allopolyploid hybrid genomes using whole genome shotgun sequencing.« less

  16. Tolerance of DNA Mismatches in Dmc1 Recombinase-mediated DNA Strand Exchange.

    PubMed

    Borgogno, María V; Monti, Mariela R; Zhao, Weixing; Sung, Patrick; Argaraña, Carlos E; Pezza, Roberto J

    2016-03-04

    Recombination between homologous chromosomes is required for the faithful meiotic segregation of chromosomes and leads to the generation of genetic diversity. The conserved meiosis-specific Dmc1 recombinase catalyzes homologous recombination triggered by DNA double strand breaks through the exchange of parental DNA sequences. Although providing an efficient rate of DNA strand exchange between polymorphic alleles, Dmc1 must also guard against recombination between divergent sequences. How DNA mismatches affect Dmc1-mediated DNA strand exchange is not understood. We have used fluorescence resonance energy transfer to study the mechanism of Dmc1-mediated strand exchange between DNA oligonucleotides with different degrees of heterology. The efficiency of strand exchange is highly sensitive to the location, type, and distribution of mismatches. Mismatches near the 3' end of the initiating DNA strand have a small effect, whereas most mismatches near the 5' end impede strand exchange dramatically. The Hop2-Mnd1 protein complex stimulates Dmc1-catalyzed strand exchange on homologous DNA or containing a single mismatch. We observed that Dmc1 can reject divergent DNA sequences while bypassing a few mismatches in the DNA sequence. Our findings have important implications in understanding meiotic recombination. First, Dmc1 acts as an initial barrier for heterologous recombination, with the mismatch repair system providing a second level of proofreading, to ensure that ectopic sequences are not recombined. Second, Dmc1 stepping over infrequent mismatches is likely critical for allowing recombination between the polymorphic sequences of homologous chromosomes, thus contributing to gene conversion and genetic diversity. © 2016 by The American Society for Biochemistry and Molecular Biology, Inc.

  17. Tolerance of DNA Mismatches in Dmc1 Recombinase-mediated DNA Strand Exchange*

    PubMed Central

    Borgogno, María V.; Monti, Mariela R.; Zhao, Weixing; Sung, Patrick; Argaraña, Carlos E.; Pezza, Roberto J.

    2016-01-01

    Recombination between homologous chromosomes is required for the faithful meiotic segregation of chromosomes and leads to the generation of genetic diversity. The conserved meiosis-specific Dmc1 recombinase catalyzes homologous recombination triggered by DNA double strand breaks through the exchange of parental DNA sequences. Although providing an efficient rate of DNA strand exchange between polymorphic alleles, Dmc1 must also guard against recombination between divergent sequences. How DNA mismatches affect Dmc1-mediated DNA strand exchange is not understood. We have used fluorescence resonance energy transfer to study the mechanism of Dmc1-mediated strand exchange between DNA oligonucleotides with different degrees of heterology. The efficiency of strand exchange is highly sensitive to the location, type, and distribution of mismatches. Mismatches near the 3′ end of the initiating DNA strand have a small effect, whereas most mismatches near the 5′ end impede strand exchange dramatically. The Hop2-Mnd1 protein complex stimulates Dmc1-catalyzed strand exchange on homologous DNA or containing a single mismatch. We observed that Dmc1 can reject divergent DNA sequences while bypassing a few mismatches in the DNA sequence. Our findings have important implications in understanding meiotic recombination. First, Dmc1 acts as an initial barrier for heterologous recombination, with the mismatch repair system providing a second level of proofreading, to ensure that ectopic sequences are not recombined. Second, Dmc1 stepping over infrequent mismatches is likely critical for allowing recombination between the polymorphic sequences of homologous chromosomes, thus contributing to gene conversion and genetic diversity. PMID:26709229

  18. Restriction and Sequence Alterations Affect DNA Uptake Sequence-Dependent Transformation in Neisseria meningitidis

    PubMed Central

    Ambur, Ole Herman; Frye, Stephan A.; Nilsen, Mariann; Hovland, Eirik; Tønjum, Tone

    2012-01-01

    Transformation is a complex process that involves several interactions from the binding and uptake of naked DNA to homologous recombination. Some actions affect transformation favourably whereas others act to limit it. Here, meticulous manipulation of a single type of transforming DNA allowed for quantifying the impact of three different mediators of meningococcal transformation: NlaIV restriction, homologous recombination and the DNA Uptake Sequence (DUS). In the wildtype, an inverse relationship between the transformation frequency and the number of NlaIV restriction sites in DNA was observed when the transforming DNA harboured a heterologous region for selection (ermC) but not when the transforming DNA was homologous with only a single nucleotide heterology. The influence of homologous sequence in transforming DNA was further studied using plasmids with a small interruption or larger deletions in the recombinogenic region and these alterations were found to impair transformation frequency. In contrast, a particularly potent positive driver of DNA uptake in Neisseria sp. are short DUS in the transforming DNA. However, the molecular mechanism(s) responsible for DUS specificity remains unknown. Increasing the number of DUS in the transforming DNA was here shown to exert a positive effect on transformation. Furthermore, an influence of variable placement of DUS relative to the homologous region in the donor DNA was documented for the first time. No effect of altering the orientation of DUS was observed. These observations suggest that DUS is important at an early stage in the recognition of DNA, but does not exclude the existence of more than one level of DUS specificity in the sequence of events that constitute transformation. New knowledge on the positive and negative drivers of transformation may in a larger perspective illuminate both the mechanisms and the evolutionary role(s) of one of the most conserved mechanisms in nature: homologous recombination. PMID:22768309

  19. Comparative sequence analysis of a region on human chromosome 13q14, frequently deleted in B-cell chronic lymphocytic leukemia, and its homologous region on mouse chromosome 14.

    PubMed

    Kapanadze, B; Makeeva, N; Corcoran, M; Jareborg, N; Hammarsund, M; Baranova, A; Zabarovsky, E; Vorontsova, O; Merup, M; Gahrton, G; Jansson, M; Yankovsky, N; Einhorn, S; Oscier, D; Grandér, D; Sangfelt, O

    2000-12-15

    Previous studies have indicated the presence of a putative tumor suppressor gene on human chromosome 13q14, commonly deleted in patients with B-cell chronic lymphocytic leukemia (B-CLL). We have recently identified a minimally deleted region encompassing parts of two adjacent genes, termed LEU1 and LEU2 (leukemia-associated genes 1 and 2), and several additional transcripts. In addition, 50 kb centromeric to this region we have identified another gene, LEU5/RFP2. To elucidate further the complex genomic organization of this region, we have identified, mapped, and sequenced the homologous region in the mouse. Fluorescence in situ hybridization analysis demonstrated that the region maps to mouse chromosome 14. The overall organization and gene order in this region were found to be highly conserved in the mouse. Sequence comparison between the human deletion hotspot region and its homologous mouse region revealed a high degree of sequence conservation with an overall score of 74%. However, our data also show that in terms of transcribed sequences, only two of those, human LEU2 and LEU5/RFP2, are clearly conserved, strengthening the case for these genes as putative candidate B-CLL tumor suppressor genes.

  20. Identification and quantification of homologous series of compound in complex mixtures: autocovariance study of GC/MS chromatograms.

    PubMed

    Pietrogrande, Maria Chiara; Zampolli, Maria Grazia; Dondi, Francesco

    2006-04-15

    The paper describes a method for determining homologous classes of compounds in a multicomponent complex chromatogram obtained under programming elution conditions. The method is based on the computation of the autocovariance function of the experimental chromatogram (EACVF). The EACVF plot, if properly interpreted, can be regarded as a "class chromatogram" i.e., a virtual chromatogram formed by peaks whose positions and heights allow identification and quantification of the different homologous series, even if they are embedded in a random complex chromatogram. Theoretical models were developed to describe complex chromatograms displaying random retention pattern, ordered sequences or a combination of them. On the basis of theoretical autocovariance function, the properties of the chromatogram can be experimentally evaluated, under well-defined conditions: in particular, the two components of the chromatogram, ordered and random, can be identified. Moreover, the total number of single components (SCs) and the separated number of the SCs belonging to the random and ordered components can be determined, when the two components display the same concentration. If the mixture contains several homologous series with common frequency and different phase values, the number and identity of the different homologous series as well as the number of SCs belonging to each of them can be evaluated. Moreover, the power of the EACVF method can be magnified by applying it to the single ion monitoring (SIM) signals to selectively detect specific compound classes in order to identify the different homologous series. By this way, a full "decoding" of the complex multicomponent chromatogram is achieved. The method was validated on synthetic mixtures containing known amount of SCs belonging to homologous series of hydrocarbon, alcohols, ketones, and aromatic compounds in addition to other not structurally related SCs. The method was applied to both the total ion monitoring (TIC) and the SIM signals, to describe step by step the essence of the procedure. Moreover, the systematic use of both SIM and TIC can simplify the decoding procedure of complex chromatograms by singling out only specific compound classes or by confirming the identification of the different homologous series. The method was further applied to a sample containing unknown number of compounds and homologous series (a petroleum benzin, bp 140-160 degrees C): the results obtained were meaningful in terms of both the identified number of components and identified homologous series.

  1. Evaluation of targeted exome sequencing for 28 protein-based blood group systems, including the homologous gene systems, for blood group genotyping.

    PubMed

    Schoeman, Elizna M; Lopez, Genghis H; McGowan, Eunike C; Millard, Glenda M; O'Brien, Helen; Roulis, Eileen V; Liew, Yew-Wah; Martin, Jacqueline R; McGrath, Kelli A; Powley, Tanya; Flower, Robert L; Hyland, Catherine A

    2017-04-01

    Blood group single nucleotide polymorphism genotyping probes for a limited range of polymorphisms. This study investigated whether massively parallel sequencing (also known as next-generation sequencing), with a targeted exome strategy, provides an extended blood group genotype and the extent to which massively parallel sequencing correctly genotypes in homologous gene systems, such as RH and MNS. Donor samples (n = 28) that were extensively phenotyped and genotyped using single nucleotide polymorphism typing, were analyzed using the TruSight One Sequencing Panel and MiSeq platform. Genes for 28 protein-based blood group systems, GATA1, and KLF1 were analyzed. Copy number variation analysis was used to characterize complex structural variants in the GYPC and RH systems. The average sequencing depth per target region was 66.2 ± 39.8. Each sample harbored on average 43 ± 9 variants, of which 10 ± 3 were used for genotyping. For the 28 samples, massively parallel sequencing variant sequences correctly matched expected sequences based on single nucleotide polymorphism genotyping data. Copy number variation analysis defined the Rh C/c alleles and complex RHD hybrids. Hybrid RHD*D-CE-D variants were correctly identified, but copy number variation analysis did not confidently distinguish between D and CE exon deletion versus rearrangement. The targeted exome sequencing strategy employed extended the range of blood group genotypes detected compared with single nucleotide polymorphism typing. This single-test format included detection of complex MNS hybrid cases and, with copy number variation analysis, defined RH hybrid genes along with the RHCE*C allele hitherto difficult to resolve by variant detection. The approach is economical compared with whole-genome sequencing and is suitable for a red blood cell reference laboratory setting. © 2017 AABB.

  2. Next-Generation Sequence Analysis of the Genome of RFHVMn, the Macaque Homolog of Kaposi's Sarcoma (KS)-Associated Herpesvirus, from a KS-Like Tumor of a Pig-Tailed Macaque

    PubMed Central

    Bruce, A. Gregory; Ryan, Jonathan T.; Thomas, Mathew J.; Peng, Xinxia; Grundhoff, Adam; Tsai, Che-Chung

    2013-01-01

    The complete sequence of retroperitoneal fibromatosis-associated herpesvirus Macaca nemestrina (RFHVMn), the pig-tailed macaque homolog of Kaposi's sarcoma-associated herpesvirus (KSHV), was determined by next-generation sequence analysis of a Kaposi's sarcoma (KS)-like macaque tumor. Colinearity of genes was observed with the KSHV genome, and the core herpesvirus genes had strong sequence homology to the corresponding KSHV genes. RFHVMn lacked homologs of open reading frame 11 (ORF11) and KSHV ORFs K5 and K6, which appear to have been generated by duplication of ORFs K3 and K4 after the divergence of KSHV and RFHV. RFHVMn contained positional homologs of all other unique KSHV genes, although some showed limited sequence similarity. RFHVMn contained a number of candidate microRNA genes. Although there was little sequence similarity with KSHV microRNAs, one candidate contained the same seed sequence as the positional homolog, kshv-miR-K12-10a, suggesting functional overlap. RNA transcript splicing was highly conserved between RFHVMn and KSHV, and strong sequence conservation was noted in specific promoters and putative origins of replication, predicting important functional similarities. Sequence comparisons indicated that RFHVMn and KSHV developed in long-term synchrony with the evolution of their hosts, and both viruses phylogenetically group within the RV1 lineage of Old World primate rhadinoviruses. RFHVMn is the closest homolog of KSHV to be completely sequenced and the first sequenced RV1 rhadinovirus homolog of KSHV from a nonhuman Old World primate. The strong genetic and sequence similarity between RFHVMn and KSHV, coupled with similarities in biology and pathology, demonstrate that RFHVMn infection in macaques offers an important and relevant model for the study of KSHV in humans. PMID:24109218

  3. Prediction of TF target sites based on atomistic models of protein-DNA complexes

    PubMed Central

    Angarica, Vladimir Espinosa; Pérez, Abel González; Vasconcelos, Ana T; Collado-Vides, Julio; Contreras-Moreira, Bruno

    2008-01-01

    Background The specific recognition of genomic cis-regulatory elements by transcription factors (TFs) plays an essential role in the regulation of coordinated gene expression. Studying the mechanisms determining binding specificity in protein-DNA interactions is thus an important goal. Most current approaches for modeling TF specific recognition rely on the knowledge of large sets of cognate target sites and consider only the information contained in their primary sequence. Results Here we describe a structure-based methodology for predicting sequence motifs starting from the coordinates of a TF-DNA complex. Our algorithm combines information regarding the direct and indirect readout of DNA into an atomistic statistical model, which is used to estimate the interaction potential. We first measure the ability of our method to correctly estimate the binding specificities of eight prokaryotic and eukaryotic TFs that belong to different structural superfamilies. Secondly, the method is applied to two homology models, finding that sampling of interface side-chain rotamers remarkably improves the results. Thirdly, the algorithm is compared with a reference structural method based on contact counts, obtaining comparable predictions for the experimental complexes and more accurate sequence motifs for the homology models. Conclusion Our results demonstrate that atomic-detail structural information can be feasibly used to predict TF binding sites. The computational method presented here is universal and might be applied to other systems involving protein-DNA recognition. PMID:18922190

  4. Versatility and Invariance in the Evolution of Homologous Heteromeric Interfaces

    PubMed Central

    Andreani, Jessica; Faure, Guilhem; Guerois, Raphaël

    2012-01-01

    Evolutionary pressures act on protein complex interfaces so that they preserve their complementarity. Nonetheless, the elementary interactions which compose the interface are highly versatile throughout evolution. Understanding and characterizing interface plasticity across evolution is a fundamental issue which could provide new insights into protein-protein interaction prediction. Using a database of 1,024 couples of close and remote heteromeric structural interologs, we studied protein-protein interactions from a structural and evolutionary point of view. We systematically and quantitatively analyzed the conservation of different types of interface contacts. Our study highlights astonishing plasticity regarding polar contacts at complex interfaces. It also reveals that up to a quarter of the residues switch out of the interface when comparing two homologous complexes. Despite such versatility, we identify two important interface descriptors which correlate with an increased conservation in the evolution of interfaces: apolar patches and contacts surrounding anchor residues. These observations hold true even when restricting the dataset to transiently formed complexes. We show that a combination of six features related either to sequence or to geometric properties of interfaces can be used to rank positions likely to share similar contacts between two interologs. Altogether, our analysis provides important tracks for extracting meaningful information from multiple sequence alignments of conserved binding partners and for discriminating near-native interfaces using evolutionary information. PMID:22952442

  5. Detecting false positive sequence homology: a machine learning approach.

    PubMed

    Fujimoto, M Stanley; Suvorov, Anton; Jensen, Nicholas O; Clement, Mark J; Bybee, Seth M

    2016-02-24

    Accurate detection of homologous relationships of biological sequences (DNA or amino acid) amongst organisms is an important and often difficult task that is essential to various evolutionary studies, ranging from building phylogenies to predicting functional gene annotations. There are many existing heuristic tools, most commonly based on bidirectional BLAST searches that are used to identify homologous genes and combine them into two fundamentally distinct classes: orthologs and paralogs. Due to only using heuristic filtering based on significance score cutoffs and having no cluster post-processing tools available, these methods can often produce multiple clusters constituting unrelated (non-homologous) sequences. Therefore sequencing data extracted from incomplete genome/transcriptome assemblies originated from low coverage sequencing or produced by de novo processes without a reference genome are susceptible to high false positive rates of homology detection. In this paper we develop biologically informative features that can be extracted from multiple sequence alignments of putative homologous genes (orthologs and paralogs) and further utilized in context of guided experimentation to verify false positive outcomes. We demonstrate that our machine learning method trained on both known homology clusters obtained from OrthoDB and randomly generated sequence alignments (non-homologs), successfully determines apparent false positives inferred by heuristic algorithms especially among proteomes recovered from low-coverage RNA-seq data. Almost ~42 % and ~25 % of predicted putative homologies by InParanoid and HaMStR respectively were classified as false positives on experimental data set. Our process increases the quality of output from other clustering algorithms by providing a novel post-processing method that is both fast and efficient at removing low quality clusters of putative homologous genes recovered by heuristic-based approaches.

  6. Exploiting three kinds of interface propensities to identify protein binding sites.

    PubMed

    Liu, Bin; Wang, Xiaolong; Lin, Lei; Dong, Qiwen; Wang, Xuan

    2009-08-01

    Predicting the binding sites between two interacting proteins provides important clues to the function of a protein. In this study, we present a building block of proteins called order profiles to use the evolutionary information of the protein sequence frequency profiles and apply this building block to produce a class of propensities called order profile interface propensities. For comparisons, we revisit the usage of residue interface propensities and binary profile interface propensities for protein binding site prediction. Each kind of propensities combined with sequence profiles and accessible surface areas are inputted into SVM. When tested on four types of complexes (hetero-permanent complexes, hetero-transient complexes, homo-permanent complexes and homo-transient complexes), experimental results show that the order profile interface propensities are better than residue interface propensities and binary profile interface propensities. Therefore, order profile is a suitable profile-level building block of the protein sequences and can be widely used in many tasks of computational biology, such as the sequence alignment, the prediction of domain boundary, the designation of knowledge-based potentials and the protein remote homology detection.

  7. BCL2 oncogene translocation is mediated by a chi-like consensus

    PubMed Central

    1992-01-01

    Examination of 64 translocations involving the major breakpoint region (mbr) of the BCL2 oncogene and the immunoglobulin heavy chain locus identified three short (14, 16, and 18 bp) segments within the mbr at which translocations occurred with very high frequency. Each of these clusters was associated with a 15-bp region of sequence homology, the principal one containing an octamer related to chi, the procaryotic activator of recombination. The presence of short deletions and N nucleotide additions at the breakpoints, as well as involvement of JH and DH coding regions, suggested that these sequences served as signals capable of interacting with the VDJ recombinase complex, even though no homology with the traditional heptamer/spacer/nonamer (IgRSS) existed. Furthermore, the BCL2 signal sequences were employed in a bidirectional fashion and could mediate recombination of one mbr region with another. Segments homologous to the BCL2 signal sequences flanked individual members of the XP family of diversity gene segments, which were themselves highly overrepresented in the reciprocal products (18q-) of BCL2 translocation. We propose that the chi-like signal sequences of BCL2 represent a distinct class of recognition sites for the recombinase complex, responsible for initiating interactions between regions of DNA separated by great distances, and that BCL2 translocation begins by a recombination event between mbr and DXP chi signals. Since recombinant joints containing chi, not IgRSS, occur in brain cells expressing RAG-1 (Matsuoka, M., F. Nagawa, K. Okazaki, L. Kingsbury, K. Yoshida, U. Muller, D. T. Larue, J. A. Winer, and H. Sakano. 1991. Science [Wash. DC]. 254:81; reference 1), we further suggest that the product of this gene could mediate both BCL2 translocation and the first step of normal DJ assembly through the creation of chi joints, rather than signal or coding joints. PMID:1588282

  8. Prokaryotic Caspase Homologs: Phylogenetic Patterns and Functional Characteristics Reveal Considerable Diversity

    PubMed Central

    Asplund-Samuelsson, Johannes; Bergman, Birgitta; Larsson, John

    2012-01-01

    Caspases accomplish initiation and execution of apoptosis, a programmed cell death process specific to metazoans. The existence of prokaryotic caspase homologs, termed metacaspases, has been known for slightly more than a decade. Despite their potential connection to the evolution of programmed cell death in eukaryotes, the phylogenetic distribution and functions of these prokaryotic metacaspase sequences are largely uncharted, while a few experiments imply involvement in programmed cell death. Aiming at providing a more detailed picture of prokaryotic caspase homologs, we applied a computational approach based on Hidden Markov Model search profiles to identify and functionally characterize putative metacaspases in bacterial and archaeal genomes. Out of the total of 1463 analyzed genomes, merely 267 (18%) were identified to contain putative metacaspases, but their taxonomic distribution included most prokaryotic phyla and a few archaea (Euryarchaeota). Metacaspases were particularly abundant in Alphaproteobacteria, Deltaproteobacteria and Cyanobacteria, which harbor many morphologically and developmentally complex organisms, and a distinct correlation was found between abundance and phenotypic complexity in Cyanobacteria. Notably, Bacillus subtilis and Escherichia coli, known to undergo genetically regulated autolysis, lacked metacaspases. Pfam domain architecture analysis combined with operon identification revealed rich and varied configurations among the metacaspase sequences. These imply roles in programmed cell death, but also e.g. in signaling, various enzymatic activities and protein modification. Together our data show a wide and scattered distribution of caspase homologs in prokaryotes with structurally and functionally diverse sub-groups, and with a potentially intriguing evolutionary role. These features will help delineate future characterizations of death pathways in prokaryotes. PMID:23185476

  9. Process of labeling specific chromosomes using recombinant repetitive DNA

    DOEpatents

    Moyzis, R.K.; Meyne, J.

    1988-02-12

    Chromosome preferential nucleotide sequences are first determined from a library of recombinant DNA clones having families of repetitive sequences. Library clones are identified with a low homology with a sequence of repetitive DNA families to which the first clones respectively belong and variant sequences are then identified by selecting clones having a pattern of hybridization with genomic DNA dissimilar to the hybridization pattern shown by the respective families. In another embodiment, variant sequences are selected from a sequence of a known repetitive DNA family. The selected variant sequence is classified as chromosome specific, chromosome preferential, or chromosome nonspecific. Sequences which are classified as chromosome preferential are further sequenced and regions are identified having a low homology with other regions of the chromosome preferential sequence or with known sequences of other family members and consensus sequences of the repetitive DNA families for the chromosome preferential sequences. The selected low homology regions are then hybridized with chromosomes to determine those low homology regions hybridized with a specific chromosome under normal stringency conditions.

  10. Human structural variation: mechanisms of chromosome rearrangements

    PubMed Central

    Weckselblatt, Brooke; Rudd, M. Katharine

    2015-01-01

    Chromosome structural variation (SV) is a normal part of variation in the human genome, but some classes of SV can cause neurodevelopmental disorders. Analysis of the DNA sequence at SV breakpoints can reveal mutational mechanisms and risk factors for chromosome rearrangement. Large-scale SV breakpoint studies have become possible recently owing to advances in next-generation sequencing (NGS) including whole-genome sequencing (WGS). These findings have shed light on complex forms of SV such as triplications, inverted duplications, insertional translocations, and chromothripsis. Sequence-level breakpoint data resolve SV structure and determine how genes are disrupted, fused, and/or misregulated by breakpoints. Recent improvements in breakpoint sequencing have also revealed non-allelic homologous recombination (NAHR) between paralogous long interspersed nuclear element (LINE) or human endogenous retrovirus (HERV) repeats as a cause of deletions, duplications, and translocations. This review covers the genomic organization of simple and complex constitutional SVs, as well as the molecular mechanisms of their formation. PMID:26209074

  11. RAP80, ubiquitin and SUMO in the DNA damage response.

    PubMed

    Lombardi, Patrick M; Matunis, Michael J; Wolberger, Cynthia

    2017-08-01

    A decade has passed since the first reported connection between RAP80 and BRCA1 in DNA double-strand break repair. Despite the initial identification of RAP80 as a factor localizing BRCA1 to DNA double-strand breaks and potentially promoting homologous recombination, there is increasing evidence that RAP80 instead suppresses homologous recombination to fine-tune the balance of competing DNA repair processes during the S/G 2 phase of the cell cycle. RAP80 opposes homologous recombination by inhibiting DNA end-resection and sequestering BRCA1 into the BRCA1-A complex. Ubiquitin and SUMO modifications of chromatin at DNA double-strand breaks recruit RAP80, which contains distinct sequence motifs that recognize ubiquitin and SUMO. Here, we review RAP80's role in repressing homologous recombination at DNA double-strand breaks and how this role is facilitated by its ability to bind ubiquitin and SUMO modifications.

  12. Biochemical identification of Argonaute 2 as the sole protein required for RNA-induced silencing complex activity

    PubMed Central

    Rand, Tim A.; Ginalski, Krzysztof; Grishin, Nick V.; Wang, Xiaodong

    2004-01-01

    RNA interference is carried out by the small double-stranded RNA-induced silencing complex (RISC). The RISC-bound small RNA guides the RISC complex to identify and cleave mRNAs with complementary sequences. The proteins that make up the RISC complex and cleave mRNA have not been unequivocally defined. Here, we report the biochemical purification of RISC activity to homogeneity from Drosophila Schnieder 2 cell extracts. Argonaute 2 (Ago-2) is the sole protein component present in the purified, functional RISC. By using a bioinformatics method that combines sequence-profile analysis with predicted protein secondary structure, we found homology between the PIWI domain of Ago-2 and endonuclease V and identified potential active-site amino acid residues within the PIWI domain of Ago-2. PMID:15452342

  13. Biochemical identification of Argonaute 2 as the sole protein required for RNA-induced silencing complex activity.

    PubMed

    Rand, Tim A; Ginalski, Krzysztof; Grishin, Nick V; Wang, Xiaodong

    2004-10-05

    RNA interference is carried out by the small double-stranded RNA-induced silencing complex (RISC). The RISC-bound small RNA guides the RISC complex to identify and cleave mRNAs with complementary sequences. The proteins that make up the RISC complex and cleave mRNA have not been unequivocally defined. Here, we report the biochemical purification of RISC activity to homogeneity from Drosophila Schnieder 2 cell extracts. Argonaute 2 (Ago-2) is the sole protein component present in the purified, functional RISC. By using a bioinformatics method that combines sequence-profile analysis with predicted protein secondary structure, we found homology between the PIWI domain of Ago-2 and endonuclease V and identified potential active-site amino acid residues within the PIWI domain of Ago-2.

  14. Structures of Arg- and Gln-type bacterial cysteine dioxygenase homologs: Arg- and Gln-type Bacterial CDO Homologs

    DOE PAGES

    Driggers, Camden M.; Hartman, Steven J.; Karplus, P. Andrew

    2015-01-01

    In some bacteria, cysteine is converted to cysteine sulfinic acid by cysteine dioxygenases (CDO) that are only ~15–30% identical in sequence to mammalian CDOs. Among bacterial proteins having this range of sequence similarity to mammalian CDO are some that conserve an active site Arg residue (“Arg-type” enzymes) and some having a Gln substituted for this Arg (“Gln-type” enzymes). Here, we describe a structure from each of these enzyme types by analyzing structures originally solved by structural genomics groups but not published: a Bacillus subtilis “Arg-type” enzyme that has cysteine dioxygenase activity (BsCDO), and a Ralstonia eutropha “Gln-type” CDO homolog ofmore » uncharacterized activity (ReCDOhom). The BsCDO active site is well conserved with mammalian CDO, and a cysteine complex captured in the active site confirms that the cysteine binding mode is also similar. The ReCDOhom structure reveals a new active site Arg residue that is hydrogen bonding to an iron-bound diatomic molecule we have interpreted as dioxygen. Notably, the Arg position is not compatible with the mode of Cys binding seen in both rat CDO and BsCDO. As sequence alignments show that this newly discovered active site Arg is well conserved among “Gln-type” CDO enzymes, we conclude that the “Gln-type” CDO homologs are not authentic CDOs but will have substrate specificity more similar to 3-mercaptopropionate dioxygenases.« less

  15. Functional region prediction with a set of appropriate homologous sequences-an index for sequence selection by integrating structure and sequence information with spatial statistics

    PubMed Central

    2012-01-01

    Background The detection of conserved residue clusters on a protein structure is one of the effective strategies for the prediction of functional protein regions. Various methods, such as Evolutionary Trace, have been developed based on this strategy. In such approaches, the conserved residues are identified through comparisons of homologous amino acid sequences. Therefore, the selection of homologous sequences is a critical step. It is empirically known that a certain degree of sequence divergence in the set of homologous sequences is required for the identification of conserved residues. However, the development of a method to select homologous sequences appropriate for the identification of conserved residues has not been sufficiently addressed. An objective and general method to select appropriate homologous sequences is desired for the efficient prediction of functional regions. Results We have developed a novel index to select the sequences appropriate for the identification of conserved residues, and implemented the index within our method to predict the functional regions of a protein. The implementation of the index improved the performance of the functional region prediction. The index represents the degree of conserved residue clustering on the tertiary structure of the protein. For this purpose, the structure and sequence information were integrated within the index by the application of spatial statistics. Spatial statistics is a field of statistics in which not only the attributes but also the geometrical coordinates of the data are considered simultaneously. Higher degrees of clustering generate larger index scores. We adopted the set of homologous sequences with the highest index score, under the assumption that the best prediction accuracy is obtained when the degree of clustering is the maximum. The set of sequences selected by the index led to higher functional region prediction performance than the sets of sequences selected by other sequence-based methods. Conclusions Appropriate homologous sequences are selected automatically and objectively by the index. Such sequence selection improved the performance of functional region prediction. As far as we know, this is the first approach in which spatial statistics have been applied to protein analyses. Such integration of structure and sequence information would be useful for other bioinformatics problems. PMID:22643026

  16. Identification of SHIP-1 and SHIP-2 homologs in channel catfish, Ictalurus punctatus

    USDA-ARS?s Scientific Manuscript database

    Src homology domain 2 (SH2) domain-containing inositol 5’-phosphatases (SHIP) proteins have diverse roles in signal transduction. SHIP-1 and SHIP-2 homologs were identified in channel catfish, Ictalurus punctatus, based on sequence homology to murine and human SHIP sequences. Full-length cDNAs for ...

  17. The impact of CRISPR repeat sequence on structures of a Cas6 protein-RNA complex

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wang, Ruiying; Zheng, Han; Preamplume, Gan

    The repeat-associated mysterious proteins (RAMPs) comprise the most abundant family of proteins involved in prokaryotic immunity against invading genetic elements conferred by the clustered regularly interspaced short palindromic repeat (CRISPR) system. Cas6 is one of the first characterized RAMP proteins and is a key enzyme required for CRISPR RNA maturation. Despite a strong structural homology with other RAMP proteins that bind hairpin RNA, Cas6 distinctly recognizes single-stranded RNA. Previous structural and biochemical studies show that Cas6 captures the 5' end while cleaving the 3' end of the CRISPR RNA. Here, we describe three structures and complementary biochemical analysis of amore » noncatalytic Cas6 homolog from Pyrococcus horikoshii bound to CRISPR repeat RNA of different sequences. Our study confirms the specificity of the Cas6 protein for single-stranded RNA and further reveals the importance of the bases at Positions 5-7 in Cas6-RNA interactions. Substitutions of these bases result in structural changes in the protein-RNA complex including its oligomerization state.« less

  18. A low-complexity add-on score for protein remote homology search with COMER.

    PubMed

    Margelevicius, Mindaugas

    2018-06-15

    Protein sequence alignment forms the basis for comparative modeling, the most reliable approach to protein structure prediction, among many other applications. Alignment between sequence families, or profile-profile alignment, represents one of the most, if not the most, sensitive means for homology detection but still necessitates improvement. We aim at improving the quality of profile-profile alignments and the sensitivity induced by them by refining profile-profile substitution scores. We have developed a new score that represents an additional component of profile-profile substitution scores. A comprehensive evaluation shows that the new add-on score statistically significantly improves both the sensitivity and the alignment quality of the COMER method. We discuss why the score leads to the improvement and its almost optimal computational complexity that makes it easily implementable in any profile-profile alignment method. An implementation of the add-on score in the open-source COMER software and data are available at https://sourceforge.net/projects/comer. The COMER software is also available on Github at https://github.com/minmarg/comer and as a Docker image (minmar/comer). Supplementary data are available at Bioinformatics online.

  19. Central region component1, a novel synaptonemal complex component, is essential for meiotic recombination initiation in rice.

    PubMed

    Miao, Chunbo; Tang, Ding; Zhang, Honggen; Wang, Mo; Li, Yafei; Tang, Shuzhu; Yu, Hengxiu; Gu, Minghong; Cheng, Zhukuan

    2013-08-01

    In meiosis, homologous recombination entails programmed DNA double-strand break (DSB) formation and synaptonemal complex (SC) assembly coupled with the DSB repair. Although SCs display extensive structural conservation among species, their components identified are poorly conserved at the sequence level. Here, we identified a novel SC component, designated central region component1 (CRC1), in rice (Oryza sativa). CRC1 colocalizes with ZEP1, the rice SC transverse filament protein, to the central region of SCs in a mutually dependent fashion. Consistent with this colocalization, CRC1 interacts with ZEP1 in yeast two-hybrid assays. CRC1 is orthologous to Saccharomyces cerevisiae pachytene checkpoint2 (Pch2) and Mus musculus THYROID receptor-interacting protein13 (TRIP13) and may be a conserved SC component. Additionally, we provide evidence that CRC1 is essential for meiotic DSB formation. CRC1 interacts with homologous pairing aberration in rice meiosis1 (PAIR1) in vitro, suggesting that these proteins act as a complex to promote DSB formation. PAIR2, the rice ortholog of budding yeast homolog pairing1, is required for homologous chromosome pairing. We found that CRC1 is also essential for the recruitment of PAIR2 onto meiotic chromosomes. The roles of CRC1 identified here have not been reported for Pch2 or TRIP13.

  20. CENTRAL REGION COMPONENT1, a Novel Synaptonemal Complex Component, Is Essential for Meiotic Recombination Initiation in Rice[C][W

    PubMed Central

    Miao, Chunbo; Tang, Ding; Zhang, Honggen; Wang, Mo; Li, Yafei; Tang, Shuzhu; Yu, Hengxiu; Gu, Minghong; Cheng, Zhukuan

    2013-01-01

    In meiosis, homologous recombination entails programmed DNA double-strand break (DSB) formation and synaptonemal complex (SC) assembly coupled with the DSB repair. Although SCs display extensive structural conservation among species, their components identified are poorly conserved at the sequence level. Here, we identified a novel SC component, designated CENTRAL REGION COMPONENT1 (CRC1), in rice (Oryza sativa). CRC1 colocalizes with ZEP1, the rice SC transverse filament protein, to the central region of SCs in a mutually dependent fashion. Consistent with this colocalization, CRC1 interacts with ZEP1 in yeast two-hybrid assays. CRC1 is orthologous to Saccharomyces cerevisiae pachytene checkpoint2 (Pch2) and Mus musculus THYROID RECEPTOR-INTERACTING PROTEIN13 (TRIP13) and may be a conserved SC component. Additionally, we provide evidence that CRC1 is essential for meiotic DSB formation. CRC1 interacts with HOMOLOGOUS PAIRING ABERRATION IN RICE MEIOSIS1 (PAIR1) in vitro, suggesting that these proteins act as a complex to promote DSB formation. PAIR2, the rice ortholog of budding yeast homolog pairing1, is required for homologous chromosome pairing. We found that CRC1 is also essential for the recruitment of PAIR2 onto meiotic chromosomes. The roles of CRC1 identified here have not been reported for Pch2 or TRIP13. PMID:23943860

  1. Biosynthesis of Lipoic Acid in Arabidopsis: Cloning and Characterization of the cDNA for Lipoic Acid Synthase1

    PubMed Central

    Yasuno, Rie; Wada, Hajime

    1998-01-01

    Lipoic acid is a coenzyme that is essential for the activity of enzyme complexes such as those of pyruvate dehydrogenase and glycine decarboxylase. We report here the isolation and characterization of LIP1 cDNA for lipoic acid synthase of Arabidopsis. The Arabidopsis LIP1 cDNA was isolated using an expressed sequence tag homologous to the lipoic acid synthase of Escherichia coli. This cDNA was shown to code for Arabidopsis lipoic acid synthase by its ability to complement a lipA mutant of E. coli defective in lipoic acid synthase. DNA-sequence analysis of the LIP1 cDNA revealed an open reading frame predicting a protein of 374 amino acids. Comparisons of the deduced amino acid sequence with those of E. coli and yeast lipoic acid synthase homologs showed a high degree of sequence similarity and the presence of a leader sequence presumably required for import into the mitochondria. Southern-hybridization analysis suggested that LIP1 is a single-copy gene in Arabidopsis. Western analysis with an antibody against lipoic acid synthase demonstrated that this enzyme is located in the mitochondrial compartment in Arabidopsis cells as a 43-kD polypeptide. PMID:9808738

  2. PCV: An Alignment Free Method for Finding Homologous Nucleotide Sequences and its Application in Phylogenetic Study.

    PubMed

    Kumar, Rajnish; Mishra, Bharat Kumar; Lahiri, Tapobrata; Kumar, Gautam; Kumar, Nilesh; Gupta, Rahul; Pal, Manoj Kumar

    2017-06-01

    Online retrieval of the homologous nucleotide sequences through existing alignment techniques is a common practice against the given database of sequences. The salient point of these techniques is their dependence on local alignment techniques and scoring matrices the reliability of which is limited by computational complexity and accuracy. Toward this direction, this work offers a novel way for numerical representation of genes which can further help in dividing the data space into smaller partitions helping formation of a search tree. In this context, this paper introduces a 36-dimensional Periodicity Count Value (PCV) which is representative of a particular nucleotide sequence and created through adaptation from the concept of stochastic model of Kolekar et al. (American Institute of Physics 1298:307-312, 2010. doi: 10.1063/1.3516320 ). The PCV construct uses information on physicochemical properties of nucleotides and their positional distribution pattern within a gene. It is observed that PCV representation of gene reduces computational cost in the calculation of distances between a pair of genes while being consistent with the existing methods. The validity of PCV-based method was further tested through their use in molecular phylogeny constructs in comparison with that using existing sequence alignment methods.

  3. A computational study of the chemokine receptor CXCR1 bound with interleukin-8

    NASA Astrophysics Data System (ADS)

    Wang, Yang; Severin Lupala, Cecylia; Wang, Ting; Li, Xuanxuan; Yun, Ji-Hye; Park, Jae-hyun; Jin, Zeyu; Lee, Weontae; Tan, Leihan; Liu, Haiguang

    2018-03-01

    CXCR1 is a G-protein coupled receptor, transducing signals from chemokines, in particular the interleukin-8 (IL8) molecules. This study combines homology modeling and molecular dynamics simulation methods to study the structure of CXCR1-IL8 complex. By using CXCR4-vMIP-II crystallography structure as the homologous template, CXCR1-IL8 complex structure was constructed, and then refined using all-atom molecular dynamics simulations. Through extensive simulations, CXCR1-IL8 binding poses were investigated in detail. Furthermore, the role of the N-terminal of CXCR1 receptor was studied by comparing four complex models differing in the N-terminal sequences. The results indicate that the receptor N-terminal affects the binding of IL8 significantly. With a shorter N-terminal domain, the binding of IL8 to CXCR1 becomes unstable. The homology modeling and simulations also reveal the key receptor-ligand residues involved in the electrostatic interactions known to be vital for complex formation. Project supported by the National Natural Science Foundation of China (Grant Nos. 11575021, U1530401, and U1430237) and the National Research Foundation of Korea (Grant Nos. NRF-2017R1A2B2008483 and NRF-2016R1A6A3A04010213).

  4. Behavior of restriction–modification systems as selfish mobile elements and their impact on genome evolution

    PubMed Central

    Kobayashi, Ichizo

    2001-01-01

    Restriction–modification (RM) systems are composed of genes that encode a restriction enzyme and a modification methylase. RM systems sometimes behave as discrete units of life, like viruses and transposons. RM complexes attack invading DNA that has not been properly modified and thus may serve as a tool of defense for bacterial cells. However, any threat to their maintenance, such as a challenge by a competing genetic element (an incompatible plasmid or an allelic homologous stretch of DNA, for example) can lead to cell death through restriction breakage in the genome. This post-segregational or post-disturbance cell killing may provide the RM complexes (and any DNA linked with them) with a competitive advantage. There is evidence that they have undergone extensive horizontal transfer between genomes, as inferred from their sequence homology, codon usage bias and GC content difference. They are often linked with mobile genetic elements such as plasmids, viruses, transposons and integrons. The comparison of closely related bacterial genomes also suggests that, at times, RM genes themselves behave as mobile elements and cause genome rearrangements. Indeed some bacterial genomes that survived post-disturbance attack by an RM gene complex in the laboratory have experienced genome rearrangements. The avoidance of some restriction sites by bacterial genomes may result from selection by past restriction attacks. Both bacteriophages and bacteria also appear to use homologous recombination to cope with the selfish behavior of RM systems. RM systems compete with each other in several ways. One is competition for recognition sequences in post-segregational killing. Another is super-infection exclusion, that is, the killing of the cell carrying an RM system when it is infected with another RM system of the same regulatory specificity but of a different sequence specificity. The capacity of RM systems to act as selfish, mobile genetic elements may underlie the structure and function of RM enzymes. PMID:11557807

  5. Behavior of restriction-modification systems as selfish mobile elements and their impact on genome evolution.

    PubMed

    Kobayashi, I

    2001-09-15

    Restriction-modification (RM) systems are composed of genes that encode a restriction enzyme and a modification methylase. RM systems sometimes behave as discrete units of life, like viruses and transposons. RM complexes attack invading DNA that has not been properly modified and thus may serve as a tool of defense for bacterial cells. However, any threat to their maintenance, such as a challenge by a competing genetic element (an incompatible plasmid or an allelic homologous stretch of DNA, for example) can lead to cell death through restriction breakage in the genome. This post-segregational or post-disturbance cell killing may provide the RM complexes (and any DNA linked with them) with a competitive advantage. There is evidence that they have undergone extensive horizontal transfer between genomes, as inferred from their sequence homology, codon usage bias and GC content difference. They are often linked with mobile genetic elements such as plasmids, viruses, transposons and integrons. The comparison of closely related bacterial genomes also suggests that, at times, RM genes themselves behave as mobile elements and cause genome rearrangements. Indeed some bacterial genomes that survived post-disturbance attack by an RM gene complex in the laboratory have experienced genome rearrangements. The avoidance of some restriction sites by bacterial genomes may result from selection by past restriction attacks. Both bacteriophages and bacteria also appear to use homologous recombination to cope with the selfish behavior of RM systems. RM systems compete with each other in several ways. One is competition for recognition sequences in post-segregational killing. Another is super-infection exclusion, that is, the killing of the cell carrying an RM system when it is infected with another RM system of the same regulatory specificity but of a different sequence specificity. The capacity of RM systems to act as selfish, mobile genetic elements may underlie the structure and function of RM enzymes.

  6. [Study on the genetic difference of SEO type Hantaviruses].

    PubMed

    Zhang, X; Zhou, S; Wang, H; Hu, J; Guan, Z; Liu, H

    2000-10-01

    To understand the genetic type of Hantaviruses and the difference between them caused by rodents in Beijing and to furhter explore the source of the infectious factors. Hantavirus RNA, isolated from lungs of rodents captured in Beijing and positive with Hantavirus antigens with frozen sectioning and Immunofluorescent assay, were reverse-transcribed and amplified with PCR with Hantavirus-specific primers. Five of the PCR amplifications were discovered and sequenced with 300 bp sequence data of M segments (from 2003 - 2302nt according cDNA of seoul 8039 strain). Nucleotide sequence homology showed that they were sequences of SEO-type Hantavirus. Compared with SEO type Hantavirus, the nucleotide sequence homology of these samples was more than 94% while the homology of amonia acid sequence was more than 98%. When compared with HNT type Hantavirus, the homology of nucleotide sequence became less than 72% with the homology of amonia acid sequence less than 81%. Similar to other Hantavirus of SEO type, their nucleotide sequences and deduced amino acid sequences were highly preserved. Phylogenetic tree analysis showed that the five viruses could be divided into at least 4 branches. It was quite likely that there were at least two sub-type SEO viruses with 4 branches that were circulating in Beijing.

  7. Pea chloroplast DNA encodes homologues of Escherichia coli ribosomal subunit S2 and the beta'-subunit of RNA polymerase.

    PubMed Central

    Cozens, A L; Walker, J E

    1986-01-01

    The nucleotide sequence has been determined of a segment of 4680 bases of the pea chloroplast genome. It adjoins a sequence described elsewhere that encodes subunits of the F0 membrane domain of the ATP-synthase complex. The sequence contains a potential gene encoding a protein which is strongly related to the S2 polypeptide of Escherichia coli ribosomes. It also encodes an incomplete protein which contains segments that are homologous to the beta'-subunit of E. coli RNA polymerase and to yeast RNA polymerases II and III. PMID:3530249

  8. [Sequence analysis of LEAFY homologous gene from Dendrobium moniliforme and application for identification of medicinal Dendrobium].

    PubMed

    Xing, Wen-Rui; Hou, Bei-Wei; Guan, Jing-Jiao; Luo, Jing; Ding, Xiao-Yu

    2013-04-01

    The LEAFY (LFY) homologous gene of Dendrobium moniliforme (L.) Sw. was cloned by new primers which were designed based on the conservative region of known sequences of orchid LEAFY gene. Partial LFY homologous gene was cloned by common PCR, then we got the complete LFY homologous gene Den LFY by Tail-PCR. The complete sequence of DenLFY gene was 3 575 bp which contained three exons and two introns. Using BLAST method, comparison analysis among the exon of LFY homologous gene indicted that the DenLFY gene had high identity with orchids LFY homologous, including the related fragment of PhalLFY (84%) in Phalaenopsis hybrid cultivar, LFY homologous gene in Oncidium (90%) and in other orchid (over 80%). Using MP analysis, Dendrobium is found to be the sister to Oncidium and Phalaenopsis. Homologous analysis demonstrated that the C-terminal amino acids were highly conserved. When the exons and introns were separately considered, exons and the sequence of amino acid were good markers for the function research of DenLFY gene. The second intron can be used in authentication research of Dendrobium based on the length polymorphism between Dendrobium moniliforme and Dendrobium officinale.

  9. Induction and repair of DNA double strand breaks: the increasing spectrum of non-homologous end joining pathways.

    PubMed

    Mladenov, Emil; Iliakis, George

    2011-06-03

    A defining characteristic of damage induced in the DNA by ionizing radiation (IR) is its clustered character that leads to the formation of complex lesions challenging the cellular repair mechanisms. The most widely investigated such complex lesion is the DNA double strand break (DSB). DSBs undermine chromatin stability and challenge the repair machinery because an intact template strand is lacking to assist restoration of integrity and sequence in the DNA molecule. Therefore, cells have evolved a sophisticated machinery to detect DSBs and coordinate a response on the basis of inputs from various sources. A central function of cellular responses to DSBs is the coordination of DSB repair. Two conceptually different mechanisms can in principle remove DSBs from the genome of cells of higher eukaryotes. Homologous recombination repair (HRR) uses as template a homologous DNA molecule and is therefore error-free; it functions preferentially in the S and G2 phases. Non-homologous end joining (NHEJ), on the other hand, simply restores DNA integrity by joining the two ends, is error prone as sequence is only fortuitously preserved and active throughout the cell cycle. The basis of DSB repair pathway choice remains unknown, but cells of higher eukaryotes appear programmed to utilize preferentially NHEJ. Recent work suggests that when the canonical DNA-PK dependent pathway of NHEJ (D-NHEJ), becomes compromised an alternative NHEJ pathway and not HRR substitutes in a quasi-backup function (B-NHEJ). Here, we outline aspects of DSB induction by IR and review the mechanisms of their processing in cells of higher eukaryotes. We place particular emphasis on backup pathways of NHEJ and summarize their increasing significance in various cellular processes, as well as their potential contribution to carcinogenesis. 2011 Elsevier B.V. All rights reserved.

  10. Vital Roles of the Second DNA-binding Site of Rad52 Protein in Yeast Homologous Recombination*

    PubMed Central

    Arai, Naoto; Kagawa, Wataru; Saito, Kengo; Shingu, Yoshinori; Mikawa, Tsutomu; Kurumizaka, Hitoshi; Shibata, Takehiko

    2011-01-01

    RecA/Rad51 proteins are essential in homologous DNA recombination and catalyze the ATP-dependent formation of D-loops from a single-stranded DNA and an internal homologous sequence in a double-stranded DNA. RecA and Rad51 require a “recombination mediator” to overcome the interference imposed by the prior binding of single-stranded binding protein/replication protein A to the single-stranded DNA. Rad52 is the prototype of recombination mediators, and the human Rad52 protein has two distinct DNA-binding sites: the first site binds to single-stranded DNA, and the second site binds to either double- or single-stranded DNA. We previously showed that yeast Rad52 extensively stimulates Rad51-catalyzed D-loop formation even in the absence of replication protein A, by forming a 2:1 stoichiometric complex with Rad51. However, the precise roles of Rad52 and Rad51 within the complex are unknown. In the present study, we constructed yeast Rad52 mutants in which the amino acid residues corresponding to the second DNA-binding site of the human Rad52 protein were replaced with either alanine or aspartic acid. We found that the second DNA-binding site is important for the yeast Rad52 function in vivo. Rad51-Rad52 complexes consisting of these Rad52 mutants were defective in promoting the formation of D-loops, and the ability of the complex to associate with double-stranded DNA was specifically impaired. Our studies suggest that Rad52 within the complex associates with double-stranded DNA to assist Rad51-mediated homologous pairing. PMID:21454474

  11. Cloning of Giardia lamblia heat shock protein HSP70 homologs: implications regarding origin of eukaryotic cells and of endoplasmic reticulum.

    PubMed Central

    Gupta, R S; Aitken, K; Falah, M; Singh, B

    1994-01-01

    The genes for two different 70-kDa heat shock protein (HSP70) homologs have been cloned and sequenced from the protozoan Giardia lamblia. On the basis of their sequence features, one of these genes corresponds to the cytoplasmic form of HSP70. The second gene, on the basis of its characteristic N-terminal hydrophobic signal sequence and C-terminal endoplasmic reticulum (ER) retention sequence (Lys-Asp-Glu-Leu), is the equivalent of ER-resident GRP78 or the Bip family of proteins. Phylogenetic trees based on HSP70 sequences show that G. lamblia homologs show the deepest divergence among eukaryotic species. The identification of a GRP78 or Bip homolog in G. lamblia strongly suggests the existence of ER in this ancient eukaryote. Detailed phylogenetic analyses of HSP70 sequences by boot-strap neighbor-joining and maximum-parsimony methods show that the cytoplasmic and ER homologs form distinct subfamilies that evolved from a common eukaryotic ancestor by gene duplication that occurred very early in the evolution of eukaryotic cells. It is postulated that because of the essential "molecular chaperone" function of these proteins in translocation of other proteins across membranes, duplication of their genes accompanied the evolution of ER or nucleus in the eukaryotic cell ancestor. The presence in all eukaryotic cytoplasmic HSP70 homologs (including the cognate, heat-induced, and ER forms) of a number of autapomorphic sequence signatures that are not present in any prokaryotic or organellar homologs provides strong evidence regarding the monophyletic nature of eukaryotic lineage. Further, all eukaryotic HSP70 homologs share in common with the Gram-negative group of eubacteria a number of sequence features that are not present in any archaebacterium or Gram-positive bacterium, indicating their evolution from this group of organisms. Some implications of these findings regarding the evolution of eukaryotic cells and ER are discussed. Images PMID:8159675

  12. The Cohesion Protein SOLO Associates with SMC1 and Is Required for Synapsis, Recombination, Homolog Bias and Cohesion and Pairing of Centromeres in Drosophila Meiosis

    PubMed Central

    Yan, Rihui; McKee, Bruce D.

    2013-01-01

    Cohesion between sister chromatids is mediated by cohesin and is essential for proper meiotic segregation of both sister chromatids and homologs. solo encodes a Drosophila meiosis-specific cohesion protein with no apparent sequence homology to cohesins that is required in male meiosis for centromere cohesion, proper orientation of sister centromeres and centromere enrichment of the cohesin subunit SMC1. In this study, we show that solo is involved in multiple aspects of meiosis in female Drosophila. Null mutations in solo caused the following phenotypes: 1) high frequencies of homolog and sister chromatid nondisjunction (NDJ) and sharply reduced frequencies of homolog exchange; 2) reduced transmission of a ring-X chromosome, an indicator of elevated frequencies of sister chromatid exchange (SCE); 3) premature loss of centromere pairing and cohesion during prophase I, as indicated by elevated foci counts of the centromere protein CID; 4) instability of the lateral elements (LE)s and central regions of synaptonemal complexes (SCs), as indicated by fragmented and spotty staining of the chromosome core/LE component SMC1 and the transverse filament protein C(3)G, respectively, at all stages of pachytene. SOLO and SMC1 are both enriched on centromeres throughout prophase I, co-align along the lateral elements of SCs and reciprocally co-immunoprecipitate from ovarian protein extracts. Our studies demonstrate that SOLO is closely associated with meiotic cohesin and required both for enrichment of cohesin on centromeres and stable assembly of cohesin into chromosome cores. These events underlie and are required for stable cohesion of centromeres, synapsis of homologous chromosomes, and a recombination mechanism that suppresses SCE to preferentially generate homolog crossovers (homolog bias). We propose that SOLO is a subunit of a specialized meiotic cohesin complex that mediates both centromeric and axial arm cohesion and promotes homolog bias as a component of chromosome cores. PMID:23874232

  13. The cohesion protein SOLO associates with SMC1 and is required for synapsis, recombination, homolog bias and cohesion and pairing of centromeres in Drosophila Meiosis.

    PubMed

    Yan, Rihui; McKee, Bruce D

    2013-01-01

    Cohesion between sister chromatids is mediated by cohesin and is essential for proper meiotic segregation of both sister chromatids and homologs. solo encodes a Drosophila meiosis-specific cohesion protein with no apparent sequence homology to cohesins that is required in male meiosis for centromere cohesion, proper orientation of sister centromeres and centromere enrichment of the cohesin subunit SMC1. In this study, we show that solo is involved in multiple aspects of meiosis in female Drosophila. Null mutations in solo caused the following phenotypes: 1) high frequencies of homolog and sister chromatid nondisjunction (NDJ) and sharply reduced frequencies of homolog exchange; 2) reduced transmission of a ring-X chromosome, an indicator of elevated frequencies of sister chromatid exchange (SCE); 3) premature loss of centromere pairing and cohesion during prophase I, as indicated by elevated foci counts of the centromere protein CID; 4) instability of the lateral elements (LE)s and central regions of synaptonemal complexes (SCs), as indicated by fragmented and spotty staining of the chromosome core/LE component SMC1 and the transverse filament protein C(3)G, respectively, at all stages of pachytene. SOLO and SMC1 are both enriched on centromeres throughout prophase I, co-align along the lateral elements of SCs and reciprocally co-immunoprecipitate from ovarian protein extracts. Our studies demonstrate that SOLO is closely associated with meiotic cohesin and required both for enrichment of cohesin on centromeres and stable assembly of cohesin into chromosome cores. These events underlie and are required for stable cohesion of centromeres, synapsis of homologous chromosomes, and a recombination mechanism that suppresses SCE to preferentially generate homolog crossovers (homolog bias). We propose that SOLO is a subunit of a specialized meiotic cohesin complex that mediates both centromeric and axial arm cohesion and promotes homolog bias as a component of chromosome cores.

  14. Navigating highly homologous genes in a molecular diagnostic setting: a resource for clinical next-generation sequencing.

    PubMed

    Mandelker, Diana; Schmidt, Ryan J; Ankala, Arunkanth; McDonald Gibson, Kristin; Bowser, Mark; Sharma, Himanshu; Duffy, Elizabeth; Hegde, Madhuri; Santani, Avni; Lebo, Matthew; Funke, Birgit

    2016-12-01

    Next-generation sequencing (NGS) is now routinely used to interrogate large sets of genes in a diagnostic setting. Regions of high sequence homology continue to be a major challenge for short-read technologies and can lead to false-positive and false-negative diagnostic errors. At the scale of whole-exome sequencing (WES), laboratories may be limited in their knowledge of genes and regions that pose technical hurdles due to high homology. We have created an exome-wide resource that catalogs highly homologous regions that is tailored toward diagnostic applications. This resource was developed using a mappability-based approach tailored to current Sanger and NGS protocols. Gene-level and exon-level lists delineate regions that are difficult or impossible to analyze via standard NGS. These regions are ranked by degree of affectedness, annotated for medical relevance, and classified by the type of homology (within-gene, different functional gene, known pseudogene, uncharacterized noncoding region). Additionally, we provide a list of exons that cannot be analyzed by short-amplicon Sanger sequencing. This resource can help guide clinical test design, supplemental assay implementation, and results interpretation in the context of high homology.Genet Med 18 12, 1282-1289.

  15. Evidence from molecular dynamics simulations of conformational preorganization in the ribonuclease H active site

    PubMed Central

    Stafford, Kate A.; Palmer III, Arthur G.

    2014-01-01

    Ribonuclease H1 (RNase H) enzymes are well-conserved endonucleases that are present in all domains of life and are particularly important in the life cycle of retroviruses as domains within reverse transcriptase. Despite extensive study, especially of the E. coli homolog, the interaction of the highly negatively charged active site with catalytically required magnesium ions remains poorly understood. In this work, we describe molecular dynamics simulations of the E. coli homolog in complex with magnesium ions, as well as simulations of other homologs in their apo states. Collectively, these results suggest that the active site is highly rigid in the apo state of all homologs studied and is conformationally preorganized to favor the binding of a magnesium ion. Notably, representatives of bacterial, eukaryotic, and retroviral RNases H all exhibit similar active-site rigidity, suggesting that this dynamic feature is only subtly modulated by amino acid sequence and is primarily imposed by the distinctive RNase H protein fold. PMID:25075292

  16. FOLD-EM: automated fold recognition in medium- and low-resolution (4-15 Å) electron density maps.

    PubMed

    Saha, Mitul; Morais, Marc C

    2012-12-15

    Owing to the size and complexity of large multi-component biological assemblies, the most tractable approach to determining their atomic structure is often to fit high-resolution radiographic or nuclear magnetic resonance structures of isolated components into lower resolution electron density maps of the larger assembly obtained using cryo-electron microscopy (cryo-EM). This hybrid approach to structure determination requires that an atomic resolution structure of each component, or a suitable homolog, is available. If neither is available, then the amount of structural information regarding that component is limited by the resolution of the cryo-EM map. However, even if a suitable homolog cannot be identified using sequence analysis, a search for structural homologs should still be performed because structural homology often persists throughout evolution even when sequence homology is undetectable, As macromolecules can often be described as a collection of independently folded domains, one way of searching for structural homologs would be to systematically fit representative domain structures from a protein domain database into the medium/low resolution cryo-EM map and return the best fits. Taken together, the best fitting non-overlapping structures would constitute a 'mosaic' backbone model of the assembly that could aid map interpretation and illuminate biological function. Using the computational principles of the Scale-Invariant Feature Transform (SIFT), we have developed FOLD-EM-a computational tool that can identify folded macromolecular domains in medium to low resolution (4-15 Å) electron density maps and return a model of the constituent polypeptides in a fully automated fashion. As a by-product, FOLD-EM can also do flexible multi-domain fitting that may provide insight into conformational changes that occur in macromolecular assemblies.

  17. Amino acid sequences of ribosomal proteins S11 from Bacillus stearothermophilus and S19 from Halobacterium marismortui. Comparison of the ribosomal protein S11 family.

    PubMed

    Kimura, M; Kimura, J; Hatakeyama, T

    1988-11-21

    The complete amino acid sequences of ribosomal proteins S11 from the Gram-positive eubacterium Bacillus stearothermophilus and of S19 from the archaebacterium Halobacterium marismortui have been determined. A search for homologous sequences of these proteins revealed that they belong to the ribosomal protein S11 family. Homologous proteins have previously been sequenced from Escherichia coli as well as from chloroplast, yeast and mammalian ribosomes. A pairwise comparison of the amino acid sequences showed that Bacillus protein S11 shares 68% identical residues with S11 from Escherichia coli and a slightly lower homology (52%) with the homologous chloroplast protein. The halophilic protein S19 is more related to the eukaryotic (45-49%) than to the eubacterial counterparts (35%).

  18. CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction.

    PubMed

    Cui, Xuefeng; Lu, Zhiwu; Wang, Sheng; Jing-Yan Wang, Jim; Gao, Xin

    2016-06-15

    Protein homology detection, a fundamental problem in computational biology, is an indispensable step toward predicting protein structures and understanding protein functions. Despite the advances in recent decades on sequence alignment, threading and alignment-free methods, protein homology detection remains a challenging open problem. Recently, network methods that try to find transitive paths in the protein structure space demonstrate the importance of incorporating network information of the structure space. Yet, current methods merge the sequence space and the structure space into a single space, and thus introduce inconsistency in combining different sources of information. We present a novel network-based protein homology detection method, CMsearch, based on cross-modal learning. Instead of exploring a single network built from the mixture of sequence and structure space information, CMsearch builds two separate networks to represent the sequence space and the structure space. It then learns sequence-structure correlation by simultaneously taking sequence information, structure information, sequence space information and structure space information into consideration. We tested CMsearch on two challenging tasks, protein homology detection and protein structure prediction, by querying all 8332 PDB40 proteins. Our results demonstrate that CMsearch is insensitive to the similarity metrics used to define the sequence and the structure spaces. By using HMM-HMM alignment as the sequence similarity metric, CMsearch clearly outperforms state-of-the-art homology detection methods and the CASP-winning template-based protein structure prediction methods. Our program is freely available for download from http://sfb.kaust.edu.sa/Pages/Software.aspx : xin.gao@kaust.edu.sa Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  19. Defining and predicting structurally conserved regions in protein superfamilies

    PubMed Central

    Huang, Ivan K.; Grishin, Nick V.

    2013-01-01

    Motivation: The structures of homologous proteins are generally better conserved than their sequences. This phenomenon is demonstrated by the prevalence of structurally conserved regions (SCRs) even in highly divergent protein families. Defining SCRs requires the comparison of two or more homologous structures and is affected by their availability and divergence, and our ability to deduce structurally equivalent positions among them. In the absence of multiple homologous structures, it is necessary to predict SCRs of a protein using information from only a set of homologous sequences and (if available) a single structure. Accurate SCR predictions can benefit homology modelling and sequence alignment. Results: Using pairwise DaliLite alignments among a set of homologous structures, we devised a simple measure of structural conservation, termed structural conservation index (SCI). SCI was used to distinguish SCRs from non-SCRs. A database of SCRs was compiled from 386 SCOP superfamilies containing 6489 protein domains. Artificial neural networks were then trained to predict SCRs with various features deduced from a single structure and homologous sequences. Assessment of the predictions via a 5-fold cross-validation method revealed that predictions based on features derived from a single structure perform similarly to ones based on homologous sequences, while combining sequence and structural features was optimal in terms of accuracy (0.755) and Matthews correlation coefficient (0.476). These results suggest that even without information from multiple structures, it is still possible to effectively predict SCRs for a protein. Finally, inspection of the structures with the worst predictions pinpoints difficulties in SCR definitions. Availability: The SCR database and the prediction server can be found at http://prodata.swmed.edu/SCR. Contact: 91huangi@gmail.com or grishin@chop.swmed.edu Supplementary information: Supplementary data are available at Bioinformatics Online PMID:23193223

  20. ɛ-connectedness, finite approximations, shape theory and coarse graining in hyperspaces

    NASA Astrophysics Data System (ADS)

    Alonso-Morón, Manuel; Cuchillo-Ibanez, Eduardo; Luzón, Ana

    2008-12-01

    We use upper semifinite hyperspaces of compacta to describe ε-connectedness and to compute homology from finite approximations. We find a new connection between ε-connectedness and the so-called Shape Theory. We construct a geodesically complete R-tree, by means of ε-components at different resolutions, whose behavior at infinite captures the topological structure of the space of components of a given compact metric space. We also construct inverse sequences of finite spaces using internal finite approximations of compact metric spaces. These sequences can be converted into inverse sequences of polyhedra and simplicial maps by means of what we call the Alexandroff-McCord correspondence. This correspondence allows us to relate upper semifinite hyperspaces of finite approximation with the Vietoris-Rips complexes of such approximations at different resolutions. Two motivating examples are included in the introduction. We propose this procedure as a different mathematical foundation for problems on data analysis. This process is intrinsically related to the methodology of shape theory. This paper reinforces Robins’s idea of using methods from shape theory to compute homology from finite approximations.

  1. Overcoming Sequence Misalignments with Weighted Structural Superposition

    PubMed Central

    Khazanov, Nickolay A.; Damm-Ganamet, Kelly L.; Quang, Daniel X.; Carlson, Heather A.

    2012-01-01

    An appropriate structural superposition identifies similarities and differences between homologous proteins that are not evident from sequence alignments alone. We have coupled our Gaussian-weighted RMSD (wRMSD) tool with a sequence aligner and seed extension (SE) algorithm to create a robust technique for overlaying structures and aligning sequences of homologous proteins (HwRMSD). HwRMSD overcomes errors in the initial sequence alignment that would normally propagate into a standard RMSD overlay. SE can generate a corrected sequence alignment from the improved structural superposition obtained by wRMSD. HwRMSD’s robust performance and its superiority over standard RMSD are demonstrated over a range of homologous proteins. Its better overlay results in corrected sequence alignments with good agreement to HOMSTRAD. Finally, HwRMSD is compared to established structural alignment methods: FATCAT, SSM, CE, and Dalilite. Most methods are comparable at placing residue pairs within 2 Å, but HwRMSD places many more residue pairs within 1 Å, providing a clear advantage. Such high accuracy is essential in drug design, where small distances can have a large impact on computational predictions. This level of accuracy is also needed to correct sequence alignments in an automated fashion, especially for omics-scale analysis. HwRMSD can align homologs with low sequence identity and large conformational differences, cases where both sequence-based and structural-based methods may fail. The HwRMSD pipeline overcomes the dependency of structural overlays on initial sequence pairing and removes the need to determine the best sequence-alignment method, substitution matrix, and gap parameters for each unique pair of homologs. PMID:22733542

  2. Granular support vector machines with association rules mining for protein homology prediction.

    PubMed

    Tang, Yuchun; Jin, Bo; Zhang, Yan-Qing

    2005-01-01

    Protein homology prediction between protein sequences is one of critical problems in computational biology. Such a complex classification problem is common in medical or biological information processing applications. How to build a model with superior generalization capability from training samples is an essential issue for mining knowledge to accurately predict/classify unseen new samples and to effectively support human experts to make correct decisions. A new learning model called granular support vector machines (GSVM) is proposed based on our previous work. GSVM systematically and formally combines the principles from statistical learning theory and granular computing theory and thus provides an interesting new mechanism to address complex classification problems. It works by building a sequence of information granules and then building support vector machines (SVM) in some of these information granules on demand. A good granulation method to find suitable granules is crucial for modeling a GSVM with good performance. In this paper, we also propose an association rules-based granulation method. For the granules induced by association rules with high enough confidence and significant support, we leave them as they are because of their high "purity" and significant effect on simplifying the classification task. For every other granule, a SVM is modeled to discriminate the corresponding data. In this way, a complex classification problem is divided into multiple smaller problems so that the learning task is simplified. The proposed algorithm, here named GSVM-AR, is compared with SVM by KDDCUP04 protein homology prediction data. The experimental results show that finding the splitting hyperplane is not a trivial task (we should be careful to select the association rules to avoid overfitting) and GSVM-AR does show significant improvement compared to building one single SVM in the whole feature space. Another advantage is that the utility of GSVM-AR is very good because it is easy to be implemented. More importantly and more interestingly, GSVM provides a new mechanism to address complex classification problems.

  3. Endophyte Microbiome Diversity in Micropropagated Atriplex canescens and Atriplex torreyi var griffithsii

    PubMed Central

    Lucero, Mary E.; Unc, Adrian; Cooke, Peter; Dowd, Scot; Sun, Shulei

    2011-01-01

    Microbial diversity associated with micropropagated Atriplex species was assessed using microscopy, isolate culturing, and sequencing. Light, electron, and confocal microscopy revealed microbial cells in aseptically regenerated leaves and roots. Clone libraries and tag-encoded FLX amplicon pyrosequencing (TEFAP) analysis amplified sequences from callus homologous to diverse fungal and bacterial taxa. Culturing isolated some seed borne endophyte taxa which could be readily propagated apart from the host. Microbial cells were observed within biofilm-like residues associated with plant cell surfaces and intercellular spaces. Various universal primers amplified both plant and microbial sequences, with different primers revealing different patterns of fungal diversity. Bacterial and fungal TEFAP followed by alignment with sequences from curated databases revealed 7 bacterial and 17 ascomycete taxa in A. canescens, and 5 bacterial taxa in A. torreyi. Additional diversity was observed among isolates and clone libraries. Micropropagated Atriplex retains a complex, intimately associated microbiome which includes diverse strains well poised to interact in manners that influence host physiology. Microbiome analysis was facilitated by high throughput sequencing methods, but primer biases continue to limit recovery of diverse sequences from even moderately complex communities. PMID:21437280

  4. Assessing the evolutionary rate of positional orthologous genes in prokaryotes using synteny data

    PubMed Central

    Lemoine, Frédéric; Lespinet, Olivier; Labedan, Bernard

    2007-01-01

    Background Comparison of completely sequenced microbial genomes has revealed how fluid these genomes are. Detecting synteny blocks requires reliable methods to determining the orthologs among the whole set of homologs detected by exhaustive comparisons between each pair of completely sequenced genomes. This is a complex and difficult problem in the field of comparative genomics but will help to better understand the way prokaryotic genomes are evolving. Results We have developed a suite of programs that automate three essential steps to study conservation of gene order, and validated them with a set of 107 bacteria and archaea that cover the majority of the prokaryotic taxonomic space. We identified the whole set of shared homologs between two or more species and computed the evolutionary distance separating each pair of homologs. We applied two strategies to extract from the set of homologs a collection of valid orthologs shared by at least two genomes. The first computes the Reciprocal Smallest Distance (RSD) using the PAM distances separating pairs of homologs. The second method groups homologs in families and reconstructs each family's evolutionary tree, distinguishing bona fide orthologs as well as paralogs created after the last speciation event. Although the phylogenetic tree method often succeeds where RSD fails, the reverse could occasionally be true. Accordingly, we used the data obtained with either methods or their intersection to number the orthologs that are adjacent in for each pair of genomes, the Positional Orthologous Genes (POGs), and to further study their properties. Once all these synteny blocks have been detected, we showed that POGs are subject to more evolutionary constraints than orthologs outside synteny groups, whichever the taxonomic distance separating the compared organisms. Conclusion The suite of programs described in this paper allows a reliable detection of orthologs and is useful for evaluating gene order conservation in prokaryotes whichever their taxonomic distance. Thus, our approach will make easy the rapid identification of POGS in the next few years as we are expecting to be inundated with thousands of completely sequenced microbial genomes. PMID:18047665

  5. Hydra meiosis reveals unexpected conservation of structural synaptonemal complex proteins across metazoans.

    PubMed

    Fraune, Johanna; Alsheimer, Manfred; Volff, Jean-Nicolas; Busch, Karoline; Fraune, Sebastian; Bosch, Thomas C G; Benavente, Ricardo

    2012-10-09

    The synaptonemal complex (SC) is a key structure of meiosis, mediating the stable pairing (synapsis) of homologous chromosomes during prophase I. Its remarkable tripartite structure is evolutionarily well conserved and can be found in almost all sexually reproducing organisms. However, comparison of the different SC protein components in the common meiosis model organisms Saccharomyces cerevisiae, Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster, and Mus musculus revealed no sequence homology. This discrepancy challenged the hypothesis that the SC arose only once in evolution. To pursue this matter we focused on the evolution of SYCP1 and SYCP3, the two major structural SC proteins of mammals. Remarkably, our comparative bioinformatic and expression studies revealed that SYCP1 and SYCP3 are also components of the SC in the basal metazoan Hydra. In contrast to previous assumptions, we therefore conclude that SYCP1 and SYCP3 form monophyletic groups of orthologous proteins across metazoans.

  6. Hydra meiosis reveals unexpected conservation of structural synaptonemal complex proteins across metazoans

    PubMed Central

    Fraune, Johanna; Alsheimer, Manfred; Volff, Jean-Nicolas; Busch, Karoline; Fraune, Sebastian; Bosch, Thomas C. G.; Benavente, Ricardo

    2012-01-01

    The synaptonemal complex (SC) is a key structure of meiosis, mediating the stable pairing (synapsis) of homologous chromosomes during prophase I. Its remarkable tripartite structure is evolutionarily well conserved and can be found in almost all sexually reproducing organisms. However, comparison of the different SC protein components in the common meiosis model organisms Saccharomyces cerevisiae, Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster, and Mus musculus revealed no sequence homology. This discrepancy challenged the hypothesis that the SC arose only once in evolution. To pursue this matter we focused on the evolution of SYCP1 and SYCP3, the two major structural SC proteins of mammals. Remarkably, our comparative bioinformatic and expression studies revealed that SYCP1 and SYCP3 are also components of the SC in the basal metazoan Hydra. In contrast to previous assumptions, we therefore conclude that SYCP1 and SYCP3 form monophyletic groups of orthologous proteins across metazoans. PMID:23012415

  7. Isolation and sequence of partial cDNA clones of human L1: homology of human and rodent L1 in the cytoplasmic region.

    PubMed

    Harper, J R; Prince, J T; Healy, P A; Stuart, J K; Nauman, S J; Stallcup, W B

    1991-03-01

    We have isolated cDNA clones coding for the human homologue of the neuronal cell adhesion molecule L1. The nucleotide sequence of the cDNA clones and the deduced primary amino acid sequence of the carboxy terminal portion of the human L1 are homologous to the corresponding sequences of mouse L1 and rat NILE glycoprotein, with an especially high sequences identity in the cytoplasmic regions of the proteins. There is also protein sequence homology with the cytoplasmic region of the Drosophila cell adhesion molecule, neuroglian. The conservation of the cytoplasmic domain argues for an important functional role for this portion of the molecule.

  8. The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification products by 454 parallel sequencing.

    PubMed

    Binladen, Jonas; Gilbert, M Thomas P; Bollback, Jonathan P; Panitz, Frank; Bendixen, Christian; Nielsen, Rasmus; Willerslev, Eske

    2007-02-14

    The invention of the Genome Sequence 20 DNA Sequencing System (454 parallel sequencing platform) has enabled the rapid and high-volume production of sequence data. Until now, however, individual emulsion PCR (emPCR) reactions and subsequent sequencing runs have been unable to combine template DNA from multiple individuals, as homologous sequences cannot be subsequently assigned to their original sources. We use conventional PCR with 5'-nucleotide tagged primers to generate homologous DNA amplification products from multiple specimens, followed by sequencing through the high-throughput Genome Sequence 20 DNA Sequencing System (GS20, Roche/454 Life Sciences). Each DNA sequence is subsequently traced back to its individual source through 5'tag-analysis. We demonstrate that this new approach enables the assignment of virtually all the generated DNA sequences to the correct source once sequencing anomalies are accounted for (miss-assignment rate<0.4%). Therefore, the method enables accurate sequencing and assignment of homologous DNA sequences from multiple sources in single high-throughput GS20 run. We observe a bias in the distribution of the differently tagged primers that is dependent on the 5' nucleotide of the tag. In particular, primers 5' labelled with a cytosine are heavily overrepresented among the final sequences, while those 5' labelled with a thymine are strongly underrepresented. A weaker bias also exists with regards to the distribution of the sequences as sorted by the second nucleotide of the dinucleotide tags. As the results are based on a single GS20 run, the general applicability of the approach requires confirmation. However, our experiments demonstrate that 5'primer tagging is a useful method in which the sequencing power of the GS20 can be applied to PCR-based assays of multiple homologous PCR products. The new approach will be of value to a broad range of research areas, such as those of comparative genomics, complete mitochondrial analyses, population genetics, and phylogenetics.

  9. DNA Repair: The Search for Homology.

    PubMed

    Haber, James E

    2018-05-01

    The repair of chromosomal double-strand breaks (DSBs) by homologous recombination is essential to maintain genome integrity. The key step in DSB repair is the RecA/Rad51-mediated process to match sequences at the broken end to homologous donor sequences that can be used as a template to repair the lesion. Here, in reviewing research about DSB repair, I consider the many factors that appear to play important roles in the successful search for homology by several homologous recombination mechanisms. See also the video abstract here: https://youtu.be/vm7-X5uIzS8. © 2018 WILEY Periodicals, Inc.

  10. The limits of protein sequence comparison?

    PubMed Central

    Pearson, William R; Sierk, Michael L

    2010-01-01

    Modern sequence alignment algorithms are used routinely to identify homologous proteins, proteins that share a common ancestor. Homologous proteins always share similar structures and often have similar functions. Over the past 20 years, sequence comparison has become both more sensitive, largely because of profile-based methods, and more reliable, because of more accurate statistical estimates. As sequence and structure databases become larger, and comparison methods become more powerful, reliable statistical estimates will become even more important for distinguishing similarities that are due to homology from those that are due to analogy (convergence). The newest sequence alignment methods are more sensitive than older methods, but more accurate statistical estimates are needed for their full power to be realized. PMID:15919194

  11. Molecular basis of length polymorphism in the human zeta-globin gene complex.

    PubMed Central

    Goodbourn, S E; Higgs, D R; Clegg, J B; Weatherall, D J

    1983-01-01

    The length polymorphism between the human zeta-globin gene and its pseudogene is caused by an allele-specific variation in the copy number of a tandemly repeating 36-base-pair sequence. This sequence is related to a tandemly repeated 14-base-pair sequence in the 5' flanking region of the human insulin gene, which is known to cause length polymorphism, and to a repetitive sequence in intervening sequence (IVS) 1 of the pseudo-zeta-globin gene. Evidence is presented that the latter is also of variable length, probably because of differences in the copy number of the tandem repeat. The homology between the three length polymorphisms may be an indication of the presence of a more widespread group of related sequences in the human genome, which might be useful for generalized linkage studies. PMID:6308667

  12. Multiple templates-based homology modeling enhances structure quality of AT1 receptor: validation by molecular dynamics and antagonist docking.

    PubMed

    Sokkar, Pandian; Mohandass, Shylajanaciyar; Ramachandran, Murugesan

    2011-07-01

    We present a comparative account on 3D-structures of human type-1 receptor (AT1) for angiotensin II (AngII), modeled using three different methodologies. AngII activates a wide spectrum of signaling responses via the AT1 receptor that mediates physiological control of blood pressure and diverse pathological actions in cardiovascular, renal, and other cell types. Availability of 3D-model of AT1 receptor would significantly enhance the development of new drugs for cardiovascular diseases. However, templates of AT1 receptor with low sequence similarity increase the complexity in straightforward homology modeling, and hence there is a need to evaluate different modeling methodologies in order to use the models for sensitive applications such as rational drug design. Three models were generated for AT1 receptor by, (1) homology modeling with bovine rhodopsin as template, (2) homology modeling with multiple templates and (3) threading using I-TASSER web server. Molecular dynamics (MD) simulation (15 ns) of models in explicit membrane-water system, Ramachandran plot analysis and molecular docking with antagonists led to the conclusion that multiple template-based homology modeling outweighs other methodologies for AT1 modeling.

  13. DWARF – a data warehouse system for analyzing protein families

    PubMed Central

    Fischer, Markus; Thai, Quan K; Grieb, Melanie; Pleiss, Jürgen

    2006-01-01

    Background The emerging field of integrative bioinformatics provides the tools to organize and systematically analyze vast amounts of highly diverse biological data and thus allows to gain a novel understanding of complex biological systems. The data warehouse DWARF applies integrative bioinformatics approaches to the analysis of large protein families. Description The data warehouse system DWARF integrates data on sequence, structure, and functional annotation for protein fold families. The underlying relational data model consists of three major sections representing entities related to the protein (biochemical function, source organism, classification to homologous families and superfamilies), the protein sequence (position-specific annotation, mutant information), and the protein structure (secondary structure information, superimposed tertiary structure). Tools for extracting, transforming and loading data from public available resources (ExPDB, GenBank, DSSP) are provided to populate the database. The data can be accessed by an interface for searching and browsing, and by analysis tools that operate on annotation, sequence, or structure. We applied DWARF to the family of α/β-hydrolases to host the Lipase Engineering database. Release 2.3 contains 6138 sequences and 167 experimentally determined protein structures, which are assigned to 37 superfamilies 103 homologous families. Conclusion DWARF has been designed for constructing databases of large structurally related protein families and for evaluating their sequence-structure-function relationships by a systematic analysis of sequence, structure and functional annotation. It has been applied to predict biochemical properties from sequence, and serves as a valuable tool for protein engineering. PMID:17094801

  14. Using structure to explore the sequence alignment space of remote homologs.

    PubMed

    Kuziemko, Andrew; Honig, Barry; Petrey, Donald

    2011-10-01

    Protein structure modeling by homology requires an accurate sequence alignment between the query protein and its structural template. However, sequence alignment methods based on dynamic programming (DP) are typically unable to generate accurate alignments for remote sequence homologs, thus limiting the applicability of modeling methods. A central problem is that the alignment that is "optimal" in terms of the DP score does not necessarily correspond to the alignment that produces the most accurate structural model. That is, the correct alignment based on structural superposition will generally have a lower score than the optimal alignment obtained from sequence. Variations of the DP algorithm have been developed that generate alternative alignments that are "suboptimal" in terms of the DP score, but these still encounter difficulties in detecting the correct structural alignment. We present here a new alternative sequence alignment method that relies heavily on the structure of the template. By initially aligning the query sequence to individual fragments in secondary structure elements and combining high-scoring fragments that pass basic tests for "modelability", we can generate accurate alignments within a small ensemble. Our results suggest that the set of sequences that can currently be modeled by homology can be greatly extended.

  15. Mutation of Hip’s Carboxy-Terminal Region Inhibits a Transitional Stage of Progesterone Receptor Assembly

    PubMed Central

    Prapapanich, Viravan; Chen, Shiying; Smith, David F.

    1998-01-01

    Steroid receptor complexes are assembled through an ordered, multistep pathway involving multiple components of the cytoplasmic chaperone machinery. Two of these components are Hsp70-binding proteins, Hip and Hop, that have some limited homology in their C-terminal regions, outside the sequences mapped for Hsp70 binding. Within this region of Hip is a DPEV sequence that occurs twice; in Hop, one DPEV sequence plus a partial second sequence occurs. In an effort to better understand Hip function as it relates to assembly of progesterone receptor complexes, the DPEV region of Hip was targeted for mutations. Each DPEV sequence was mutated to an APAV sequence, singly or in combination. The combined mutation, APAV2, was further combined with a deletion of Hip’s tetratricopeptide repeat region that is required for Hsp70 binding or with a deletion of Hip’s GGMP repeat. An additional mutant was prepared by truncation of Hip’s DPEV-containing C terminus. By comparing interactions of various Hip forms with Hsp70, it was determined that mutation of the DPEV sequences created a dominant inhibitory form of Hip. The mutant Hip-Hsp70 complex was not prevented from interacting with progesterone receptor, but the mutant caused a dose-dependent inhibition of receptor assembly with Hsp90. The behavior of the Hip mutant is consistent with a model in which Hip and Hop are required to facilitate the transition from an early receptor complex with Hsp70 into later complexes containing Hsp90. PMID:9447991

  16. The OGCleaner: filtering false-positive homology clusters.

    PubMed

    Fujimoto, M Stanley; Suvorov, Anton; Jensen, Nicholas O; Clement, Mark J; Snell, Quinn; Bybee, Seth M

    2017-01-01

    Detecting homologous sequences in organisms is an essential step in protein structure and function prediction, gene annotation and phylogenetic tree construction. Heuristic methods are often employed for quality control of putative homology clusters. These heuristics, however, usually only apply to pairwise sequence comparison and do not examine clusters as a whole. We present the Orthology Group Cleaner (the OGCleaner), a tool designed for filtering putative orthology groups as homology or non-homology clusters by considering all sequences in a cluster. The OGCleaner relies on high-quality orthologous groups identified in OrthoDB to train machine learning algorithms that are able to distinguish between true-positive and false-positive homology groups. This package aims to improve the quality of phylogenetic tree construction especially in instances of lower-quality transcriptome assemblies. https://github.com/byucsl/ogcleaner CONTACT: sfujimoto@gmail.comSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  17. MetaGO: Predicting Gene Ontology of Non-homologous Proteins Through Low-Resolution Protein Structure Prediction and Protein-Protein Network Mapping.

    PubMed

    Zhang, Chengxin; Zheng, Wei; Freddolino, Peter L; Zhang, Yang

    2018-03-10

    Homology-based transferal remains the major approach to computational protein function annotations, but it becomes increasingly unreliable when the sequence identity between query and template decreases below 30%. We propose a novel pipeline, MetaGO, to deduce Gene Ontology attributes of proteins by combining sequence homology-based annotation with low-resolution structure prediction and comparison, and partner's homology-based protein-protein network mapping. The pipeline was tested on a large-scale set of 1000 non-redundant proteins from the CAFA3 experiment. Under the stringent benchmark conditions where templates with >30% sequence identity to the query are excluded, MetaGO achieves average F-measures of 0.487, 0.408, and 0.598, for Molecular Function, Biological Process, and Cellular Component, respectively, which are significantly higher than those achieved by other state-of-the-art function annotations methods. Detailed data analysis shows that the major advantage of the MetaGO lies in the new functional homolog detections from partner's homology-based network mapping and structure-based local and global structure alignments, the confidence scores of which can be optimally combined through logistic regression. These data demonstrate the power of using a hybrid model incorporating protein structure and interaction networks to deduce new functional insights beyond traditional sequence homology-based referrals, especially for proteins that lack homologous function templates. The MetaGO pipeline is available at http://zhanglab.ccmb.med.umich.edu/MetaGO/. Copyright © 2018. Published by Elsevier Ltd.

  18. Structural and functional homology between the RNAPI subunits A14/A43 and the archaeal RNAP subunits E/F

    PubMed Central

    Meka, Hedije; Daoust, Gregoire; Bourke Arnvig, Kristine; Werner, Finn; Brick, Peter; Onesti, Silvia

    2003-01-01

    In the archaeal RNA polymerase and the eukaryotic RNA polymerase II, two subunits (E/F and RPB4/RPB7, respectively) form a heterodimer that reversibly associates with the core of the enzyme. Recently it has emerged that this heterodimer also has a counterpart in the other eukaryotic RNA polymerases: in particular two subunits of RNA polymerase I (A14 and A43) display genetic and biochemical characteristics that are similar to those of the RPB4 and RPB7 subunits, despite the fact that only A43 shows some sequence homology to RPB7. We demonstrate that the sequence of A14 strongly suggests the presence of a HRDC domain, a motif that is found at the C-terminus of a number of helicases and RNases. The same motif is also seen in the structure of the F subunit, suggesting a structural link between A14 and the RPB4/C17/subunit F family, even in the absence of direct sequence homology. We show that it is possible to co-express and co-purify large amounts of the recombinant A14/A43 heterodimer, indicating a tight and specific interaction between the two subunits. To shed light on the function of the heterodimer, we performed gel mobility shift assays and showed that the A14/A43 heterodimer binds single-stranded RNA in a similar way to the archaeal E/F complex. PMID:12888498

  19. Exploring the Limits of DNA Size: Naphtho-homologated DNA Bases and Pairs

    PubMed Central

    Lee, Alex H. F.; Kool, Eric T.

    2008-01-01

    A new design for DNA bases and base pairs is described in which the pyrimidine bases are widened by naphtho-homologation. Two naphtho-homologated deoxyribosides, dyyT (1) and dyyC (2) were synthesized and could be incorporated into oligonucleotides as suitably protected phosphoramidite derivatives. The deoxyribosides were found to be fluorescent, with emission maxima at 446 and 433 nm, respectively. Studies with single substitutions of 1 and 2 in the natural DNA context revealed exceptionally strong base stacking propensity for both. Sequences containing multiple substitutions of 1 and 2 paired opposite adenine and guanine were subsequently mixed and studied by several analytical methods. Data from UV mixing experiments, FRET measurements, fluorescence quenching experiments, and hybridizations on beads suggest that complementary “doublewide DNA” (yyDNA) strands may self-assemble into helical complexes with 1:1 stoichiometry. Data from thermal denaturation plots and CD spectra were less conclusive. Control experiments in one sequence context gave evidence that yyDNA helices, if formed, are preferentially antiparallel and are sequence selective. Hypothesized base pairing schemes are analogous to Watson-Crick pairing, but with glycosidic C1′-C1′ distances widened by over 45%, to ca. 15.2 Å. The possible self-assembly of the double-wide DNA helix establishes a new limit for the size of information-encoding, DNA-like molecules, and the fluorescence of yyDNA bases suggests uses as reporters in monomeric and oligomeric forms. PMID:16834396

  20. Homology analyses of the protein sequences of fatty acid synthases from chicken liver, rat mammary gland, and yeast

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chang, Soo-Ik; Hammes, G.G.

    1989-11-01

    Homology analyses of the protein sequences of chicken liver and rat mammary gland fatty acid synthases were carried out. The amino acid sequences of the chicken and rat enzymes are 67% identical. If conservative substitutions are allowed, 78% of the amino acids are matched. A region of low homologies exists between the functional domains, in particular around amino acid residues 1059-1264 of the chicken enzyme. Homologies between the active sites of chicken and rat and of chicken and yeast enzymes have been analyzed by an alignment method. A high degree of homology exists between the active sites of the chickenmore » and rat enzymes. However, the chicken and yeast enzymes show a lower degree of homology. The DADPH-binding dinucleotide folds of the {beta}-ketoacyl reductase and the enoyl reductase sites were identified by comparison with a known consensus sequence for the DADP- and FAD-binding dinucleotide folds. The active sites of all of the enzymes are primarily in hydrophobic regions of the protein. This study suggests that the genes for the functional domains of fatty acid synthase were originally separated, and these genes were connected to each other by using different connecting nucleotide sequences in different species. An alternative explanation for the differences in rat and chicken is a common ancestry and mutations in the joining regions during evolution.« less

  1. Faster sequence homology searches by clustering subsequences.

    PubMed

    Suzuki, Shuji; Kakuta, Masanori; Ishida, Takashi; Akiyama, Yutaka

    2015-04-15

    Sequence homology searches are used in various fields. New sequencing technologies produce huge amounts of sequence data, which continuously increase the size of sequence databases. As a result, homology searches require large amounts of computational time, especially for metagenomic analysis. We developed a fast homology search method based on database subsequence clustering, and implemented it as GHOSTZ. This method clusters similar subsequences from a database to perform an efficient seed search and ungapped extension by reducing alignment candidates based on triangle inequality. The database subsequence clustering technique achieved an ∼2-fold increase in speed without a large decrease in search sensitivity. When we measured with metagenomic data, GHOSTZ is ∼2.2-2.8 times faster than RAPSearch and is ∼185-261 times faster than BLASTX. The source code is freely available for download at http://www.bi.cs.titech.ac.jp/ghostz/ akiyama@cs.titech.ac.jp Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.

  2. Sequence-structure relationships in RNA loops: establishing the basis for loop homology modeling.

    PubMed

    Schudoma, Christian; May, Patrick; Nikiforova, Viktoria; Walther, Dirk

    2010-01-01

    The specific function of RNA molecules frequently resides in their seemingly unstructured loop regions. We performed a systematic analysis of RNA loops extracted from experimentally determined three-dimensional structures of RNA molecules. A comprehensive loop-structure data set was created and organized into distinct clusters based on structural and sequence similarity. We detected clear evidence of the hallmark of homology present in the sequence-structure relationships in loops. Loops differing by <25% in sequence identity fold into very similar structures. Thus, our results support the application of homology modeling for RNA loop model building. We established a threshold that may guide the sequence divergence-based selection of template structures for RNA loop homology modeling. Of all possible sequences that are, under the assumption of isosteric relationships, theoretically compatible with actual sequences observed in RNA structures, only a small fraction is contained in the Rfam database of RNA sequences and classes implying that the actual RNA loop space may consist of a limited number of unique loop structures and conserved sequences. The loop-structure data sets are made available via an online database, RLooM. RLooM also offers functionalities for the modeling of RNA loop structures in support of RNA engineering and design efforts.

  3. Fanconi anemia gene editing by the CRISPR/Cas9 system.

    PubMed

    Osborn, Mark J; Gabriel, Richard; Webber, Beau R; DeFeo, Anthony P; McElroy, Amber N; Jarjour, Jordan; Starker, Colby G; Wagner, John E; Joung, J Keith; Voytas, Daniel F; von Kalle, Christof; Schmidt, Manfred; Blazar, Bruce R; Tolar, Jakub

    2015-02-01

    Genome engineering with designer nucleases is a rapidly progressing field, and the ability to correct human gene mutations in situ is highly desirable. We employed fibroblasts derived from a patient with Fanconi anemia as a model to test the ability of the clustered regularly interspaced short palindromic repeats/Cas9 nuclease system to mediate gene correction. We show that the Cas9 nuclease and nickase each resulted in gene correction, but the nickase, because of its ability to preferentially mediate homology-directed repair, resulted in a higher frequency of corrected clonal isolates. To assess the off-target effects, we used both a predictive software platform to identify intragenic sequences of homology as well as a genome-wide screen utilizing linear amplification-mediated PCR. We observed no off-target activity and show RNA-guided endonuclease candidate sites that do not possess low sequence complexity function in a highly specific manner. Collectively, we provide proof of principle for precision genome editing in Fanconi anemia, a DNA repair-deficient human disorder.

  4. Intron loss from the NADH dehydrogenase subunit 4 gene of lettuce mitochondrial DNA: evidence for homologous recombination of a cDNA intermediate.

    PubMed

    Geiss, K T; Abbas, G M; Makaroff, C A

    1994-04-01

    The mitochondrial gene coding for subunit 4 of the NADH dehydrogenase complex I (nad4) has been isolated and characterized from lettuce, Lactuca sativa. Analysis of nad4 genes in a number of plants by Southern hybridization had previously suggested that the intron content varied between species. Characterization of the lettuce gene confirms this observation. Lettuce nad4 contains two exons and one group IIA intron, whereas previously sequenced nad4 genes from turnip and wheat contain three group IIA introns. Northern analysis identified a transcript of 1600 nucleotides, which represents the mature nad4 mRNA and a primary transcript of 3200 nucleotides. Sequence analysis of lettuce and turnip nad4 cDNAs was used to confirm the intron/exon border sequences and to examine RNA editing patterns. Editing is observed at the 5' and 3' ends of the lettuce transcript, but is absent from sequences that correspond to exons two, three and the 5' end of exon four in turnip and wheat. In contrast, turnip transcripts are highly edited in this region, suggesting that homologous recombination of an edited and spliced cDNA intermediate was involved in the loss of introns two and three from an ancestral lettuce nad4 gene.

  5. A prokaryotic viral sequence is expressed and conserved in mammalian brain.

    PubMed

    Yeh, Yang-Hui; Gunasekharan, Vignesh; Manuelidis, Laura

    2017-07-03

    A natural and permanent transfer of prokaryotic viral sequences to mammals has not been reported by others. Circular "SPHINX" DNAs <5 kb were previously isolated from nuclease-protected cytoplasmic particles in rodent neuronal cell lines and brain. Two of these DNAs were sequenced after Φ29 polymerase amplification, and they revealed significant but imperfect homology to segments of commensal Acinetobacter phage viruses. These findings were surprising because the brain is isolated from environmental microorganisms. The 1.76-kb DNA sequence (SPHINX 1.8), with an iteron before its ORF, was evaluated here for its expression in neural cells and brain. A rabbit affinity purified antibody generated against a peptide without homology to mammalian sequences labeled a nonglycosylated ∼41-kDa protein (spx1) on Western blots, and the signal was efficiently blocked by the competing peptide. Spx1 was resistant to limited proteinase K digestion, but was unrelated to the expression of host prion protein or its pathologic amyloid form. Remarkably, spx1 concentrated in selected brain synapses, such as those on anterior motor horn neurons that integrate many complex neural inputs. SPHINX 1.8 appears to be involved in tissue-specific differentiation, including essential functions that preserve its propagation during mammalian evolution, possibly via maternal inheritance. The data here indicate that mammals can share and exchange a larger world of prokaryotic viruses than previously envisioned.

  6. De novo identification of highly diverged protein repeats by probabilistic consistency.

    PubMed

    Biegert, A; Söding, J

    2008-03-15

    An estimated 25% of all eukaryotic proteins contain repeats, which underlines the importance of duplication for evolving new protein functions. Internal repeats often correspond to structural or functional units in proteins. Methods capable of identifying diverged repeated segments or domains at the sequence level can therefore assist in predicting domain structures, inferring hypotheses about function and mechanism, and investigating the evolution of proteins from smaller fragments. We present HHrepID, a method for the de novo identification of repeats in protein sequences. It is able to detect the sequence signature of structural repeats in many proteins that have not yet been known to possess internal sequence symmetry, such as outer membrane beta-barrels. HHrepID uses HMM-HMM comparison to exploit evolutionary information in the form of multiple sequence alignments of homologs. In contrast to a previous method, the new method (1) generates a multiple alignment of repeats; (2) utilizes the transitive nature of homology through a novel merging procedure with fully probabilistic treatment of alignments; (3) improves alignment quality through an algorithm that maximizes the expected accuracy; (4) is able to identify different kinds of repeats within complex architectures by a probabilistic domain boundary detection method and (5) improves sensitivity through a new approach to assess statistical significance. Server: http://toolkit.tuebingen.mpg.de/hhrepid; Executables: ftp://ftp.tuebingen.mpg.de/pub/protevo/HHrepID

  7. Unusual features of fibrillarin cDNA and gene structure in Euglena gracilis: evolutionary conservation of core proteins and structural predictions for methylation-guide box C/D snoRNPs throughout the domain Eucarya.

    PubMed

    Russell, Anthony G; Watanabe, Yoh-ichi; Charette, J Michael; Gray, Michael W

    2005-01-01

    Box C/D ribonucleoprotein (RNP) particles mediate O2'-methylation of rRNA and other cellular RNA species. In higher eukaryotic taxa, these RNPs are more complex than their archaeal counterparts, containing four core protein components (Snu13p, Nop56p, Nop58p and fibrillarin) compared with three in Archaea. This increase in complexity raises questions about the evolutionary emergence of the eukaryote-specific proteins and structural conservation in these RNPs throughout the eukaryotic domain. In protists, the primarily unicellular organisms comprising the bulk of eukaryotic diversity, the protein composition of box C/D RNPs has not yet been extensively explored. This study describes the complete gene, cDNA and protein sequences of the fibrillarin homolog from the protozoon Euglena gracilis, the first such information to be obtained for a nucleolus-localized protein in this organism. The E.gracilis fibrillarin gene contains a mixture of intron types exhibiting markedly different sizes. In contrast to most other E.gracilis mRNAs characterized to date, the fibrillarin mRNA lacks a spliced leader (SL) sequence. The predicted fibrillarin protein sequence itself is unusual in that it contains a glycine-lysine (GK)-rich domain at its N-terminus rather than the glycine-arginine-rich (GAR) domain found in most other eukaryotic fibrillarins. In an evolutionarily diverse collection of protists that includes E.gracilis, we have also identified putative homologs of the other core protein components of box C/D RNPs, thereby providing evidence that the protein composition seen in the higher eukaryotic complexes was established very early in eukaryotic cell evolution.

  8. Small tandemly repeated DNA sequences of higher plants likely originate from a tRNA gene ancestor.

    PubMed Central

    Benslimane, A A; Dron, M; Hartmann, C; Rode, A

    1986-01-01

    Several monomers (177 bp) of a tandemly arranged repetitive nuclear DNA sequence of Brassica oleracea have been cloned and sequenced. They share up to 95% homology between one another and up to 80% with other satellite DNA sequences of Cruciferae, suggesting a common ancestor. Both strands of these monomers show more than 50% homology with many tRNA genes; the best homologies have been obtained with Lys and His yeast mitochondrial tRNA genes (respectively 64% and 60%). These results suggest that small tandemly repeated DNA sequences of plants may have evolved from a tRNA gene ancestor. These tandem repeats have probably arisen via a process involving reverse transcription of polymerase III RNA intermediates, as is the case for interspersed DNA sequences of mammalians. A model is proposed to explain the formation of such small tandemly repeated DNA sequences. Images PMID:3774553

  9. Novel venom gene discovery in the platypus

    PubMed Central

    2010-01-01

    Background To date, few peptides in the complex mixture of platypus venom have been identified and sequenced, in part due to the limited amounts of platypus venom available to study. We have constructed and sequenced a cDNA library from an active platypus venom gland to identify the remaining components. Results We identified 83 novel putative platypus venom genes from 13 toxin families, which are homologous to known toxins from a wide range of vertebrates (fish, reptiles, insectivores) and invertebrates (spiders, sea anemones, starfish). A number of these are expressed in tissues other than the venom gland, and at least three of these families (those with homology to toxins from distant invertebrates) may play non-toxin roles. Thus, further functional testing is required to confirm venom activity. However, the presence of similar putative toxins in such widely divergent species provides further evidence for the hypothesis that there are certain protein families that are selected preferentially during evolution to become venom peptides. We have also used homology with known proteins to speculate on the contributions of each venom component to the symptoms of platypus envenomation. Conclusions This study represents a step towards fully characterizing the first mammal venom transcriptome. We have found similarities between putative platypus toxins and those of a number of unrelated species, providing insight into the evolution of mammalian venom. PMID:20920228

  10. Novel venom gene discovery in the platypus.

    PubMed

    Whittington, Camilla M; Papenfuss, Anthony T; Locke, Devin P; Mardis, Elaine R; Wilson, Richard K; Abubucker, Sahar; Mitreva, Makedonka; Wong, Emily S W; Hsu, Arthur L; Kuchel, Philip W; Belov, Katherine; Warren, Wesley C

    2010-01-01

    To date, few peptides in the complex mixture of platypus venom have been identified and sequenced, in part due to the limited amounts of platypus venom available to study. We have constructed and sequenced a cDNA library from an active platypus venom gland to identify the remaining components. We identified 83 novel putative platypus venom genes from 13 toxin families, which are homologous to known toxins from a wide range of vertebrates (fish, reptiles, insectivores) and invertebrates (spiders, sea anemones, starfish). A number of these are expressed in tissues other than the venom gland, and at least three of these families (those with homology to toxins from distant invertebrates) may play non-toxin roles. Thus, further functional testing is required to confirm venom activity. However, the presence of similar putative toxins in such widely divergent species provides further evidence for the hypothesis that there are certain protein families that are selected preferentially during evolution to become venom peptides. We have also used homology with known proteins to speculate on the contributions of each venom component to the symptoms of platypus envenomation. This study represents a step towards fully characterizing the first mammal venom transcriptome. We have found similarities between putative platypus toxins and those of a number of unrelated species, providing insight into the evolution of mammalian venom.

  11. Cloning and expression of a cDNA coding for a human monocyte-derived plasminogen activator inhibitor.

    PubMed

    Antalis, T M; Clark, M A; Barnes, T; Lehrbach, P R; Devine, P L; Schevzov, G; Goss, N H; Stephens, R W; Tolstoshev, P

    1988-02-01

    Human monocyte-derived plasminogen activator inhibitor (mPAI-2) was purified to homogeneity from the U937 cell line and partially sequenced. Oligonucleotide probes derived from this sequence were used to screen a cDNA library prepared from U937 cells. One positive clone was sequenced and contained most of the coding sequence as well as a long incomplete 3' untranslated region (1112 base pairs). This cDNA sequence was shown to encode mPAI-2 by hybrid-select translation. A cDNA clone encoding the remainder of the mPAI-2 mRNA was obtained by primer extension of U937 poly(A)+ RNA using a probe complementary to the mPAI-2 coding region. The coding sequence for mPAI-2 was placed under the control of the lambda PL promoter, and the protein expressed in Escherichia coli formed a complex with urokinase that could be detected immunologically. By nucleotide sequence analysis, mPAI-2 cDNA encodes a protein containing 415 amino acids with a predicted unglycosylated Mr of 46,543. The predicted amino acid sequence of mPAI-2 is very similar to placental PAI-2 (3 amino acid differences) and shows extensive homology with members of the serine protease inhibitor (serpin) superfamily. mPAI-2 was found to be more homologous to ovalbumin (37%) than the endothelial plasminogen activator inhibitor, PAI-1 (26%). Like ovalbumin, mPAI-2 appears to have no typical amino-terminal signal sequence. The 3' untranslated region of the mPAI-2 cDNA contains a putative regulatory sequence that has been associated with the inflammatory mediators.

  12. Cloning and expression of a cDNA coding for a human monocyte-derived plasminogen activator inhibitor.

    PubMed Central

    Antalis, T M; Clark, M A; Barnes, T; Lehrbach, P R; Devine, P L; Schevzov, G; Goss, N H; Stephens, R W; Tolstoshev, P

    1988-01-01

    Human monocyte-derived plasminogen activator inhibitor (mPAI-2) was purified to homogeneity from the U937 cell line and partially sequenced. Oligonucleotide probes derived from this sequence were used to screen a cDNA library prepared from U937 cells. One positive clone was sequenced and contained most of the coding sequence as well as a long incomplete 3' untranslated region (1112 base pairs). This cDNA sequence was shown to encode mPAI-2 by hybrid-select translation. A cDNA clone encoding the remainder of the mPAI-2 mRNA was obtained by primer extension of U937 poly(A)+ RNA using a probe complementary to the mPAI-2 coding region. The coding sequence for mPAI-2 was placed under the control of the lambda PL promoter, and the protein expressed in Escherichia coli formed a complex with urokinase that could be detected immunologically. By nucleotide sequence analysis, mPAI-2 cDNA encodes a protein containing 415 amino acids with a predicted unglycosylated Mr of 46,543. The predicted amino acid sequence of mPAI-2 is very similar to placental PAI-2 (3 amino acid differences) and shows extensive homology with members of the serine protease inhibitor (serpin) superfamily. mPAI-2 was found to be more homologous to ovalbumin (37%) than the endothelial plasminogen activator inhibitor, PAI-1 (26%). Like ovalbumin, mPAI-2 appears to have no typical amino-terminal signal sequence. The 3' untranslated region of the mPAI-2 cDNA contains a putative regulatory sequence that has been associated with the inflammatory mediators. Images PMID:3257578

  13. Whole genome analysis of CRISPR Cas9 sgRNA off-target homologies via an efficient computational algorithm.

    PubMed

    Zhou, Hong; Zhou, Michael; Li, Daisy; Manthey, Joseph; Lioutikova, Ekaterina; Wang, Hong; Zeng, Xiao

    2017-11-17

    The beauty and power of the genome editing mechanism, CRISPR Cas9 endonuclease system, lies in the fact that it is RNA-programmable such that Cas9 can be guided to any genomic loci complementary to a 20-nt RNA, single guide RNA (sgRNA), to cleave double stranded DNA, allowing the introduction of wanted mutations. Unfortunately, it has been reported repeatedly that the sgRNA can also guide Cas9 to off-target sites where the DNA sequence is homologous to sgRNA. Using human genome and Streptococcus pyogenes Cas9 (SpCas9) as an example, this article mathematically analyzed the probabilities of off-target homologies of sgRNAs and discovered that for large genome size such as human genome, potential off-target homologies are inevitable for sgRNA selection. A highly efficient computationl algorithm was developed for whole genome sgRNA design and off-target homology searches. By means of a dynamically constructed sequence-indexed database and a simplified sequence alignment method, this algorithm achieves very high efficiency while guaranteeing the identification of all existing potential off-target homologies. Via this algorithm, 1,876,775 sgRNAs were designed for the 19,153 human mRNA genes and only two sgRNAs were found to be free of off-target homology. By means of the novel and efficient sgRNA homology search algorithm introduced in this article, genome wide sgRNA design and off-target analysis were conducted and the results confirmed the mathematical analysis that for a sgRNA sequence, it is almost impossible to escape potential off-target homologies. Future innovations on the CRISPR Cas9 gene editing technology need to focus on how to eliminate the Cas9 off-target activity.

  14. Synthetic oligonucleotide probes deduced from amino acid sequence data. Theoretical and practical considerations.

    PubMed

    Lathe, R

    1985-05-05

    Synthetic probes deduced from amino acid sequence data are widely used to detect cognate coding sequences in libraries of cloned DNA segments. The redundancy of the genetic code dictates that a choice must be made between (1) a mixture of probes reflecting all codon combinations, and (2) a single longer "optimal" probe. The second strategy is examined in detail. The frequency of sequences matching a given probe by chance alone can be determined and also the frequency of sequences closely resembling the probe and contributing to the hybridization background. Gene banks cannot be treated as random associations of the four nucleotides, and probe sequences deduced from amino acid sequence data occur more often than predicted by chance alone. Probe lengths must be increased to confer the necessary specificity. Examination of hybrids formed between unique homologous probes and their cognate targets reveals that short stretches of perfect homology occurring by chance make a significant contribution to the hybridization background. Statistical methods for improving homology are examined, taking human coding sequences as an example, and considerations of codon utilization and dinucleotide frequencies yield an overall homology of greater than 82%. Recommendations for probe design and hybridization are presented, and the choice between using multiple probes reflecting all codon possibilities and a unique optimal probe is discussed.

  15. Comparative genomic survey, exon-intron annotation and phylogenetic analysis of NAT-homologous sequences in archaea, protists, fungi, viruses, and invertebrates

    USDA-ARS?s Scientific Manuscript database

    We have previously published extensive genomic surveys [1-3], reporting NAT-homologous sequences in hundreds of sequenced bacterial, fungal and vertebrate genomes. We present here the results of our latest search of 2445 genomes, representing 1532 (70 archaeal, 1210 bacterial, 43 protist, 97 fungal,...

  16. Evidence of protein-free homology recognition in magnetic bead force-extension experiments

    NASA Astrophysics Data System (ADS)

    O'Lee, D. J.; Danilowicz, C.; Rochester, C.; Kornyshev, A. A.; Prentiss, M.

    2016-07-01

    Earlier theoretical studies have proposed that the homology-dependent pairing of large tracts of dsDNA may be due to physical interactions between homologous regions. Such interactions could contribute to the sequence-dependent pairing of chromosome regions that may occur in the presence or the absence of double-strand breaks. Several experiments have indicated the recognition of homologous sequences in pure electrolytic solutions without proteins. Here, we report single-molecule force experiments with a designed 60 kb long dsDNA construct; one end attached to a solid surface and the other end to a magnetic bead. The 60 kb constructs contain two 10 kb long homologous tracts oriented head to head, so that their sequences match if the two tracts fold on each other. The distance between the bead and the surface is measured as a function of the force applied to the bead. At low forces, the construct molecules extend substantially less than normal, control dsDNA, indicating the existence of preferential interaction between the homologous regions. The force increase causes no abrupt but continuous unfolding of the paired homologous regions. Simple semi-phenomenological models of the unfolding mechanics are proposed, and their predictions are compared with the data.

  17. A draft sequence of the rice genome (Oryza sativa L. ssp. indica).

    PubMed

    Yu, Jun; Hu, Songnian; Wang, Jun; Wong, Gane Ka-Shu; Li, Songgang; Liu, Bin; Deng, Yajun; Dai, Li; Zhou, Yan; Zhang, Xiuqing; Cao, Mengliang; Liu, Jing; Sun, Jiandong; Tang, Jiabin; Chen, Yanjiong; Huang, Xiaobing; Lin, Wei; Ye, Chen; Tong, Wei; Cong, Lijuan; Geng, Jianing; Han, Yujun; Li, Lin; Li, Wei; Hu, Guangqiang; Huang, Xiangang; Li, Wenjie; Li, Jian; Liu, Zhanwei; Li, Long; Liu, Jianping; Qi, Qiuhui; Liu, Jinsong; Li, Li; Li, Tao; Wang, Xuegang; Lu, Hong; Wu, Tingting; Zhu, Miao; Ni, Peixiang; Han, Hua; Dong, Wei; Ren, Xiaoyu; Feng, Xiaoli; Cui, Peng; Li, Xianran; Wang, Hao; Xu, Xin; Zhai, Wenxue; Xu, Zhao; Zhang, Jinsong; He, Sijie; Zhang, Jianguo; Xu, Jichen; Zhang, Kunlin; Zheng, Xianwu; Dong, Jianhai; Zeng, Wanyong; Tao, Lin; Ye, Jia; Tan, Jun; Ren, Xide; Chen, Xuewei; He, Jun; Liu, Daofeng; Tian, Wei; Tian, Chaoguang; Xia, Hongai; Bao, Qiyu; Li, Gang; Gao, Hui; Cao, Ting; Wang, Juan; Zhao, Wenming; Li, Ping; Chen, Wei; Wang, Xudong; Zhang, Yong; Hu, Jianfei; Wang, Jing; Liu, Song; Yang, Jian; Zhang, Guangyu; Xiong, Yuqing; Li, Zhijie; Mao, Long; Zhou, Chengshu; Zhu, Zhen; Chen, Runsheng; Hao, Bailin; Zheng, Weimou; Chen, Shouyi; Guo, Wei; Li, Guojie; Liu, Siqi; Tao, Ming; Wang, Jian; Zhu, Lihuang; Yuan, Longping; Yang, Huanming

    2002-04-05

    We have produced a draft sequence of the rice genome for the most widely cultivated subspecies in China, Oryza sativa L. ssp. indica, by whole-genome shotgun sequencing. The genome was 466 megabases in size, with an estimated 46,022 to 55,615 genes. Functional coverage in the assembled sequences was 92.0%. About 42.2% of the genome was in exact 20-nucleotide oligomer repeats, and most of the transposons were in the intergenic regions between genes. Although 80.6% of predicted Arabidopsis thaliana genes had a homolog in rice, only 49.4% of predicted rice genes had a homolog in A. thaliana. The large proportion of rice genes with no recognizable homologs is due to a gradient in the GC content of rice coding sequences.

  18. Improve homology search sensitivity of PacBio data by correcting frameshifts.

    PubMed

    Du, Nan; Sun, Yanni

    2016-09-01

    Single-molecule, real-time sequencing (SMRT) developed by Pacific BioSciences produces longer reads than secondary generation sequencing technologies such as Illumina. The long read length enables PacBio sequencing to close gaps in genome assembly, reveal structural variations, and identify gene isoforms with higher accuracy in transcriptomic sequencing. However, PacBio data has high sequencing error rate and most of the errors are insertion or deletion errors. During alignment-based homology search, insertion or deletion errors in genes will cause frameshifts and may only lead to marginal alignment scores and short alignments. As a result, it is hard to distinguish true alignments from random alignments and the ambiguity will incur errors in structural and functional annotation. Existing frameshift correction tools are designed for data with much lower error rate and are not optimized for PacBio data. As an increasing number of groups are using SMRT, there is an urgent need for dedicated homology search tools for PacBio data. In this work, we introduce Frame-Pro, a profile homology search tool for PacBio reads. Our tool corrects sequencing errors and also outputs the profile alignments of the corrected sequences against characterized protein families. We applied our tool to both simulated and real PacBio data. The results showed that our method enables more sensitive homology search, especially for PacBio data sets of low sequencing coverage. In addition, we can correct more errors when comparing with a popular error correction tool that does not rely on hybrid sequencing. The source code is freely available at https://sourceforge.net/projects/frame-pro/ yannisun@msu.edu. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  19. An additional function of the rough endoplasmic reticulum protein complex prolyl 3-hydroxylase 1·cartilage-associated protein·cyclophilin B: the CXXXC motif reveals disulfide isomerase activity in vitro.

    PubMed

    Ishikawa, Yoshihiro; Bächinger, Hans Peter

    2013-11-01

    Collagen biosynthesis occurs in the rough endoplasmic reticulum, and many molecular chaperones and folding enzymes are involved in this process. The folding mechanism of type I procollagen has been well characterized, and protein disulfide isomerase (PDI) has been suggested as a key player in the formation of the correct disulfide bonds in the noncollagenous carboxyl-terminal and amino-terminal propeptides. Prolyl 3-hydroxylase 1 (P3H1) forms a hetero-trimeric complex with cartilage-associated protein and cyclophilin B (CypB). This complex is a multifunctional complex acting as a prolyl 3-hydroxylase, a peptidyl prolyl cis-trans isomerase, and a molecular chaperone. Two major domains are predicted from the primary sequence of P3H1: an amino-terminal domain and a carboxyl-terminal domain corresponding to the 2-oxoglutarate- and iron-dependent dioxygenase domains similar to the α-subunit of prolyl 4-hydroxylase and lysyl hydroxylases. The amino-terminal domain contains four CXXXC sequence repeats. The primary sequence of cartilage-associated protein is homologous to the amino-terminal domain of P3H1 and also contains four CXXXC sequence repeats. However, the function of the CXXXC sequence repeats is not known. Several publications have reported that short peptides containing a CXC or a CXXC sequence show oxido-reductase activity similar to PDI in vitro. We hypothesize that CXXXC motifs have oxido-reductase activity similar to the CXXC motif in PDI. We have tested the enzyme activities on model substrates in vitro using a GCRALCG peptide and the P3H1 complex. Our results suggest that this complex could function as a disulfide isomerase in the rough endoplasmic reticulum.

  20. Molecular cloning and characterization of rhesus monkey platelet glycoprotein Ibα, a major ligand-binding subunit of GPIb-IX-V complex.

    PubMed

    Qiao, Jianlin; Shen, Yang; Shi, Meimei; Lu, Yanrong; Cheng, Jingqiu; Chen, Younan

    2014-05-01

    Through binding to von Willebrand factor (VWF), platelet glycoprotein (GP) Ibα, the major ligand-binding subunit of the GPIb-IX-V complex, initiates platelet adhesion and aggregation in response to exposed VWF or elevated fluid-shear stress. There is little data regarding non-human primate platelet GPIbα. This study cloned and characterized rhesus monkey (Macaca Mullatta) platelet GPIbα. DNAMAN software was used for sequence analysis and alignment. N/O-glycosylation sites and 3-D structure modelling were predicted by online OGPET v1.0, NetOGlyc 1.0 Server and SWISS-MODEL, respectively. Platelet function was evaluated by ADP- or ristocetin-induced platelet aggregation. Rhesus monkey GPIbα contains 2,268 nucleotides with an open reading frame encoding 755 amino acids. Rhesus monkey GPIbα nucleotide and protein sequences share 93.27% and 89.20% homology respectively, with human. Sequences encoding the leucine-rich repeats of rhesus monkey GPIbα share strong similarity with human, whereas PEST sequences and N/O-glycosylated residues vary. The GPIbα-binding residues for thrombin, filamin A and 14-3-3ζ are highly conserved between rhesus monkey and human. Platelet function analysis revealed monkey and human platelets respond similarly to ADP, but rhesus monkey platelets failed to respond to low doses of ristocetin where human platelets achieved 76% aggregation. However, monkey platelets aggregated in response to higher ristocetin doses. Monkey GPIbα shares strong homology with human GPIbα, however there are some differences in rhesus monkey platelet activation through GPIbα engagement, which need to be considered when using rhesus monkey platelet to investigate platelet GPIbα function. Copyright © 2014 Elsevier Ltd. All rights reserved.

  1. A generalized global alignment algorithm.

    PubMed

    Huang, Xiaoqiu; Chao, Kun-Mao

    2003-01-22

    Homologous sequences are sometimes similar over some regions but different over other regions. Homologous sequences have a much lower global similarity if the different regions are much longer than the similar regions. We present a generalized global alignment algorithm for comparing sequences with intermittent similarities, an ordered list of similar regions separated by different regions. A generalized global alignment model is defined to handle sequences with intermittent similarities. A dynamic programming algorithm is designed to compute an optimal general alignment in time proportional to the product of sequence lengths and in space proportional to the sum of sequence lengths. The algorithm is implemented as a computer program named GAP3 (Global Alignment Program Version 3). The generalized global alignment model is validated by experimental results produced with GAP3 on both DNA and protein sequences. The GAP3 program extends the ability of standard global alignment programs to recognize homologous sequences of lower similarity. The GAP3 program is freely available for academic use at http://bioinformatics.iastate.edu/aat/align/align.html.

  2. The sequence, and its evolutionary implications, of a Thermococcus celer protein associated with transcription

    NASA Technical Reports Server (NTRS)

    Kaine, B. P.; Mehr, I. J.; Woese, C. R.

    1994-01-01

    Through random search, a gene from Thermococcus celer has been identified and sequenced that appears to encode a transcription-associated protein (110 amino acid residues). The sequence has clear homology to approximately the last half of an open reading frame reported previously for Sulfolobus acidocaldarius [Langer, D. & Zillig, W. (1993) Nucleic Acids Res. 21, 2251]. The protein translations of these two archaeal genes in turn are homologs of a small subunit found in eukaryotic RNA polymerase I (A12.2) and the counterpart of this from RNA polymerase II (B12.6). Homology is also seen with the eukaryotic transcription factor TFIIS, but it involves only the terminal 45 amino acids of the archaeal proteins. Evolutionary implications of these homologies are discussed.

  3. BLAST and FASTA similarity searching for multiple sequence alignment.

    PubMed

    Pearson, William R

    2014-01-01

    BLAST, FASTA, and other similarity searching programs seek to identify homologous proteins and DNA sequences based on excess sequence similarity. If two sequences share much more similarity than expected by chance, the simplest explanation for the excess similarity is common ancestry-homology. The most effective similarity searches compare protein sequences, rather than DNA sequences, for sequences that encode proteins, and use expectation values, rather than percent identity, to infer homology. The BLAST and FASTA packages of sequence comparison programs provide programs for comparing protein and DNA sequences to protein databases (the most sensitive searches). Protein and translated-DNA comparisons to protein databases routinely allow evolutionary look back times from 1 to 2 billion years; DNA:DNA searches are 5-10-fold less sensitive. BLAST and FASTA can be run on popular web sites, but can also be downloaded and installed on local computers. With local installation, target databases can be customized for the sequence data being characterized. With today's very large protein databases, search sensitivity can also be improved by searching smaller comprehensive databases, for example, a complete protein set from an evolutionarily neighboring model organism. By default, BLAST and FASTA use scoring strategies target for distant evolutionary relationships; for comparisons involving short domains or queries, or searches that seek relatively close homologs (e.g. mouse-human), shallower scoring matrices will be more effective. Both BLAST and FASTA provide very accurate statistical estimates, which can be used to reliably identify protein sequences that diverged more than 2 billion years ago.

  4. The primary structures of two yeast enolase genes. Homology between the 5' noncoding flanking regions of yeast enolase and glyceraldehyde-3-phosphate dehydrogenase genes.

    PubMed

    Holland, M J; Holland, J P; Thill, G P; Jackson, K A

    1981-02-10

    Segments of yeast genomic DNA containing two enolase structural genes have been isolated by subculture cloning procedures using a cDNA hybridization probe synthesized from purified yeast enolase mRNA. Based on restriction endonuclease and transcriptional maps of these two segments of yeast DNA, each hybrid plasmid contains a region of extensive nucleotide sequence homology which forms hybrids with the cDNA probe. The DNA sequences which flank this homologous region in the two hybrid plasmids are nonhomologous indicating that these sequences are nontandemly repeated in the yeast genome. The complete nucleotide sequence of the coding as well as the flanking noncoding regions of these genes has been determined. The amino acid sequence predicted from one reading frame of both structural genes is extremely similar to that determined for yeast enolase (Chin, C. C. Q., Brewer, J. M., Eckard, E., and Wold, F. (1981) J. Biol. Chem. 256, 1370-1376), confirming that these isolated structural genes encode yeast enolase. The nucleotide sequences of the coding regions of the genes are approximately 95% homologous, and neither gene contains an intervening sequence. Codon utilization in the enolase genes follows the same biased pattern previously described for two yeast glyceraldehyde-3-phosphate dehydrogenase structural genes (Holland, J. P., and Holland, M. J. (1980) J. Biol. Chem. 255, 2596-2605). DNA blotting analysis confirmed that the isolated segments of yeast DNA are colinear with yeast genomic DNA and that there are two nontandemly repeated enolase genes per haploid yeast genome. The noncoding portions of the two enolase genes adjacent to the initiation and termination codons are approximately 70% homologous and contain sequences thought to be involved in the synthesis and processing messenger RNA. Finally there are regions of extensive homology between the two enolase structural genes and two yeast glyceraldehyde-3-phosphate dehydrogenase structural genes within the 5- noncoding portions of these glycolytic genes.

  5. Evolutionary profiles from the QR factorization of multiple sequence alignments

    PubMed Central

    Sethi, Anurag; O'Donoghue, Patrick; Luthey-Schulten, Zaida

    2005-01-01

    We present an algorithm to generate complete evolutionary profiles that represent the topology of the molecular phylogenetic tree of the homologous group. The method, based on the multidimensional QR factorization of numerically encoded multiple sequence alignments, removes redundancy from the alignments and orders the protein sequences by increasing linear dependence, resulting in the identification of a minimal basis set of sequences that spans the evolutionary space of the homologous group of proteins. We observe a general trend that these smaller, more evolutionarily balanced profiles have comparable and, in many cases, better performance in database searches than conventional profiles containing hundreds of sequences, constructed in an iterative and computationally intensive procedure. For more diverse families or superfamilies, with sequence identity <30%, structural alignments, based purely on the geometry of the protein structures, provide better alignments than pure sequence-based methods. Merging the structure and sequence information allows the construction of accurate profiles for distantly related groups. These structure-based profiles outperformed other sequence-based methods for finding distant homologs and were used to identify a putative class II cysteinyl-tRNA synthetase (CysRS) in several archaea that eluded previous annotation studies. Phylogenetic analysis showed the putative class II CysRSs to be a monophyletic group and homology modeling revealed a constellation of active site residues similar to that in the known class I CysRS. PMID:15741270

  6. Identification of both copy number variation-type and constant-type core elements in a large segmental duplication region of the mouse genome

    PubMed Central

    2013-01-01

    Background Copy number variation (CNV), an important source of diversity in genomic structure, is frequently found in clusters called CNV regions (CNVRs). CNVRs are strongly associated with segmental duplications (SDs), but the composition of these complex repetitive structures remains unclear. Results We conducted self-comparative-plot analysis of all mouse chromosomes using the high-speed and large-scale-homology search algorithm SHEAP. For eight chromosomes, we identified various types of large SD as tartan-checked patterns within the self-comparative plots. A complex arrangement of diagonal split lines in the self-comparative-plots indicated the presence of large homologous repetitive sequences. We focused on one SD on chromosome 13 (SD13M), and developed SHEPHERD, a stepwise ab initio method, to extract longer repetitive elements and to characterize repetitive structures in this region. Analysis using SHEPHERD showed the existence of 60 core elements, which were expected to be the basic units that form SDs within the repetitive structure of SD13M. The demonstration that sequences homologous to the core elements (>70% homology) covered approximately 90% of the SD13M region indicated that our method can characterize the repetitive structure of SD13M effectively. Core elements were composed largely of fragmented repeats of a previously identified type, such as long interspersed nuclear elements (LINEs), together with partial genic regions. Comparative genome hybridization array analysis showed that whereas 42 core elements were components of CNVR that varied among mouse strains, 8 did not vary among strains (constant type), and the status of the others could not be determined. The CNV-type core elements contained significantly larger proportions of long terminal repeat (LTR) types of retrotransposon than the constant-type core elements, which had no CNV. The higher divergence rates observed in the CNV-type core elements than in the constant type indicate that the CNV-type core elements have a longer evolutionary history than constant-type core elements in SD13M. Conclusions Our methodology for the identification of repetitive core sequences simplifies characterization of the structures of large SDs and detailed analysis of CNV. The results of detailed structural and quantitative analyses in this study might help to elucidate the biological role of one of the SDs on chromosome 13. PMID:23834397

  7. Exploring representations of protein structure for automated remote homology detection and mapping of protein structure space

    PubMed Central

    2014-01-01

    Background Due to rapid sequencing of genomes, there are now millions of deposited protein sequences with no known function. Fast sequence-based comparisons allow detecting close homologs for a protein of interest to transfer functional information from the homologs to the given protein. Sequence-based comparison cannot detect remote homologs, in which evolution has adjusted the sequence while largely preserving structure. Structure-based comparisons can detect remote homologs but most methods for doing so are too expensive to apply at a large scale over structural databases of proteins. Recently, fragment-based structural representations have been proposed that allow fast detection of remote homologs with reasonable accuracy. These representations have also been used to obtain linearly-reducible maps of protein structure space. It has been shown, as additionally supported from analysis in this paper that such maps preserve functional co-localization of the protein structure space. Methods Inspired by a recent application of the Latent Dirichlet Allocation (LDA) model for conducting structural comparisons of proteins, we propose higher-order LDA-obtained topic-based representations of protein structures to provide an alternative route for remote homology detection and organization of the protein structure space in few dimensions. Various techniques based on natural language processing are proposed and employed to aid the analysis of topics in the protein structure domain. Results We show that a topic-based representation is just as effective as a fragment-based one at automated detection of remote homologs and organization of protein structure space. We conduct a detailed analysis of the information content in the topic-based representation, showing that topics have semantic meaning. The fragment-based and topic-based representations are also shown to allow prediction of superfamily membership. Conclusions This work opens exciting venues in designing novel representations to extract information about protein structures, as well as organizing and mining protein structure space with mature text mining tools. PMID:25080993

  8. Generation and Analysis of Expressed Sequence Tags from Olea europaea L.

    PubMed Central

    Ozdemir Ozgenturk, Nehir; Oruç, Fatma; Sezerman, Ugur; Kuçukural, Alper; Vural Korkut, Senay; Toksoz, Feriha; Un, Cemal

    2010-01-01

    Olive (Olea europaea L.) is an important source of edible oil which was originated in Near-East region. In this study, two cDNA libraries were constructed from young olive leaves and immature olive fruits for generation of ESTs to discover the novel genes and search the function of unknown genes of olive. The randomly selected 3840 colonies were sequenced for EST collection from both libraries. Readable 2228 sequences for olive leaf and 1506 sequences for olive fruit were assembled into 205 and 69 contigs, respectively, whereas 2478 were singletons. Putative functions of all 2752 differentially expressed unique sequences were designated by gene homology based on BLAST and annotated using BLAST2GO. While 1339 ESTs show no homology to the database, 2024 ESTs have homology (under 80%) with hypothetical proteins, putative proteins, expressed proteins, and unknown proteins in NCBI-GenBank. 635 EST's unique genes sequence have been identified by over 80% homology to known function in other species which were not previously described in Olea family. Only 3.1% of total EST's was shown similarity with olive database existing in NCBI. This generated EST's data and consensus sequences were submitted to NCBI as valuable source for functional genome studies of olive. PMID:21197085

  9. Genomewide Function Conservation and Phylogeny in the Herpesviridae

    PubMed Central

    Albà, M. Mar; Das, Rhiju; Orengo, Christine A.; Kellam, Paul

    2001-01-01

    The Herpesviridae are a large group of well-characterized double-stranded DNA viruses for which many complete genome sequences have been determined. We have extracted protein sequences from all predicted open reading frames of 19 herpesvirus genomes. Sequence comparison and protein sequence clustering methods have been used to construct herpesvirus protein homologous families. This resulted in 1692 proteins being clustered into 243 multiprotein families and 196 singleton proteins. Predicted functions were assigned to each homologous family based on genome annotation and published data and each family classified into seven broad functional groups. Phylogenetic profiles were constructed for each herpesvirus from the homologous protein families and used to determine conserved functions and genomewide phylogenetic trees. These trees agreed with molecular-sequence-derived trees and allowed greater insight into the phylogeny of ungulate and murine gammaherpesviruses. PMID:11156614

  10. Epitopes of human testis-specific lactate dehydrogenase deduced from a cDNA sequence

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Millan, J.L.; Driscoll, C.E.; LeVan, K.M.

    The sequence and structure of human testis-specific L-lactate dehydrogenase (LDHC/sub 4/, LDHX; (L)-lactate:NAD/sup +/ oxidoreductase, EC 1.1.1.27) has been derived from analysis of a complementary DNA (cDNA) clone comprising the complete protein coding region of the enzyme. From the deduced amino acid sequence, human LDHC/sub 4/ is as different from rodent LDHC/sub 4/ (73% homology) as it is from human LDHA/sub 4/ (76% homology) and porcine LDHB/sub 4/ (68% homology). Subunit homologies are consistent with the conclusion that the LDHC gene arose by at least two independent duplication events. Furthermore, the lower degree of homology between mouse and human LDHC/submore » 4/ and the appearance of this isozyme late in evolution suggests a higher rate of mutation in the mammalian LDHC genes than in the LDHA and -B genes. Comparison of exposed amino acid residues of discrete anti-genic determinants of mouse and human LDHC/sub 4/ reveals significant differences. Knowledge of the human LDHC/sub 4/ sequence will help design human-specific peptides useful in the development of a contraceptive vaccine.« less

  11. Analysis of Structural MtrC Models Based on Homology with the Crystal Structure of MtrF

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Edwards, Marcus; Fredrickson, Jim K.; Zachara, John M.

    2012-12-01

    The outer-membrane decahaem cytochrome MtrC is part of the transmembrane MtrCAB complex required for mineral respiration by Shewanella oneidensis. MtrC has significant sequence similarity to the paralogous decahaem cytochrome MtrF, which has been structurally solved through X-ray crystallography. This now allows for homology-based models of MtrC to be generated. The structure of these MtrC homology models contain ten bis-histidine-co-ordinated c-type haems arranged in a staggered cross through a four-domain structure. This model is consistent with current spectroscopic data and shows that the areas around haem 5 and haem 10, at the termini of an octahaem chain, are likely to havemore » functions similar to those of the corresponding haems in MtrF. The electrostatic surfaces around haem 7, close to the β-barrels, are different in MtrF and MtrC, indicating that these haems may have different potentials and interact with substrates differently.« less

  12. Genomic DNA sequence and cytosine methylation changes of adult rice leaves after seeds space flight

    NASA Astrophysics Data System (ADS)

    Shi, Jinming

    In this study, cytosine methylation on CCGG site and genomic DNA sequence changes of adult leaves of rice after seeds space flight were detected by methylation-sensitive amplification polymorphism (MSAP) and Amplified fragment length polymorphism (AFLP) technique respectively. Rice seeds were planted in the trial field after 4 days space flight on the shenzhou-6 Spaceship of China. Adult leaves of space-treated rice including 8 plants chosen randomly and 2 plants with phenotypic mutation were used for AFLP and MSAP analysis. Polymorphism of both DNA sequence and cytosine methylation were detected. For MSAP analysis, the average polymorphic frequency of the on-ground controls, space-treated plants and mutants are 1.3%, 3.1% and 11% respectively. For AFLP analysis, the average polymorphic frequencies are 1.4%, 2.9%and 8%respectively. Total 27 and 22 polymorphic fragments were cloned sequenced from MSAP and AFLP analysis respectively. Nine of the 27 fragments from MSAP analysis show homology to coding sequence. For the 22 polymorphic fragments from AFLP analysis, no one shows homology to mRNA sequence and eight fragments show homology to repeat region or retrotransposon sequence. These results suggest that although both genomic DNA sequence and cytosine methylation status can be effected by space flight, the genomic region homology to the fragments from genome DNA and cytosine methylation analysis were different.

  13. Chimeric mitochondrial minichromosomes of the human body louse, Pediculus humanus: evidence for homologous and non-homologous recombination.

    PubMed

    Shao, Renfu; Barker, Stephen C

    2011-02-15

    The mitochondrial (mt) genome of the human body louse, Pediculus humanus, consists of 18 minichromosomes. Each minichromosome is 3 to 4 kb long and has 1 to 3 genes. There is unequivocal evidence for recombination between different mt minichromosomes in P. humanus. It is not known, however, how these minichromosomes recombine. Here, we report the discovery of eight chimeric mt minichromosomes in P. humanus. We classify these chimeric mt minichromosomes into two groups: Group I and Group II. Group I chimeric minichromosomes contain parts of two different protein-coding genes that are from different minichromosomes. The two parts of protein-coding genes in each Group I chimeric minichromosome are joined at a microhomologous nucleotide sequence; microhomologous nucleotide sequences are hallmarks of non-homologous recombination. Group II chimeric minichromosomes contain all of the genes and the non-coding regions of two different minichromosomes. The conserved sequence blocks in the non-coding regions of Group II chimeric minichromosomes resemble the "recombination repeats" in the non-coding regions of the mt genomes of higher plants. These repeats are essential to homologous recombination in higher plants. Our analyses of the nucleotide sequences of chimeric mt minichromosomes indicate both homologous and non-homologous recombination between minichromosomes in the mitochondria of the human body louse. Copyright © 2010 Elsevier B.V. All rights reserved.

  14. Evidence of protein-free homology recognition in magnetic bead force–extension experiments

    PubMed Central

    (O’) Lee, D. J.; Danilowicz, C.; Rochester, C.; Prentiss, M.

    2016-01-01

    Earlier theoretical studies have proposed that the homology-dependent pairing of large tracts of dsDNA may be due to physical interactions between homologous regions. Such interactions could contribute to the sequence-dependent pairing of chromosome regions that may occur in the presence or the absence of double-strand breaks. Several experiments have indicated the recognition of homologous sequences in pure electrolytic solutions without proteins. Here, we report single-molecule force experiments with a designed 60 kb long dsDNA construct; one end attached to a solid surface and the other end to a magnetic bead. The 60 kb constructs contain two 10 kb long homologous tracts oriented head to head, so that their sequences match if the two tracts fold on each other. The distance between the bead and the surface is measured as a function of the force applied to the bead. At low forces, the construct molecules extend substantially less than normal, control dsDNA, indicating the existence of preferential interaction between the homologous regions. The force increase causes no abrupt but continuous unfolding of the paired homologous regions. Simple semi-phenomenological models of the unfolding mechanics are proposed, and their predictions are compared with the data. PMID:27493568

  15. The Comparative Genomics and Phylogenomics of Leishmania amazonensis Parasite.

    PubMed

    Tschoeke, Diogo A; Nunes, Gisele L; Jardim, Rodrigo; Lima, Joana; Dumaresq, Aline Sr; Gomes, Monete R; de Mattos Pereira, Leandro; Loureiro, Daniel R; Stoco, Patricia H; de Matos Guedes, Herbert Leonel; de Miranda, Antonio Basilio; Ruiz, Jeronimo; Pitaluga, André; Silva, Floriano P; Probst, Christian M; Dickens, Nicholas J; Mottram, Jeremy C; Grisard, Edmundo C; Dávila, Alberto Mr

    2014-01-01

    Leishmaniasis is an infectious disease caused by Leishmania species. Leishmania amazonensis is a New World Leishmania species belonging to the Mexicana complex, which is able to cause all types of leishmaniasis infections. The L. amazonensis reference strain MHOM/BR/1973/M2269 was sequenced identifying 8,802 codifying sequences (CDS), most of them of hypothetical function. Comparative analysis using six Leishmania species showed a core set of 7,016 orthologs. L. amazonensis and Leishmania mexicana share the largest number of distinct orthologs, while Leishmania braziliensis presented the largest number of inparalogs. Additionally, phylogenomic analysis confirmed the taxonomic position for L. amazonensis within the "Mexicana complex", reinforcing understanding of the split of New and Old World Leishmania. Potential non-homologous isofunctional enzymes (NISE) were identified between L. amazonensis and Homo sapiens that could provide new drug targets for development.

  16. Structural Studies of Geosmin Synthase, a Bifunctional Sesquiterpene Synthase with Alpha-Alpha Domain Architecture that Catalyzes a Unique Cyclization-Fragmentation Reaction Sequence

    PubMed Central

    Harris, Golda G.; Lombardi, Patrick M.; Pemberton, Travis A.; Matsui, Tsutomu; Weiss, Thomas M.; Cole, Kathryn E.; Köksal, Mustafa; Murphy, Frank V.; Vedula, L. Sangeetha; Chou, Wayne K.W.; Cane, David E.; Christianson, David W.

    2015-01-01

    Geosmin synthase from Streptomyces coelicolor (ScGS) catalyzes an unusual, metal-dependent terpenoid cyclization and fragmentation reaction sequence. Two distinct active sites are required for catalysis: the N-terminal domain catalyzes the ionization and cyclization of farnesyl diphosphate to form germacradienol and inorganic pyrophosphate (PPi), and the C-terminal domain catalyzes the protonation, cyclization, and fragmentation of germacradienol to form geosmin and acetone through a retro-Prins reaction. A unique αα domain architecture is predicted for ScGS based on amino acid sequence: each domain contains the metal-binding motifs typical of a class I terpenoid cyclase, and each domain requires Mg2+ for catalysis. Here, we report the X-ray crystal structure of the unliganded N-terminal domain of ScGS and the structure of its complex with 3 Mg2+ ions and alendronate. These structures highlight conformational changes required for active site closure and catalysis. Although neither full-length ScGS nor constructs of the C-terminal domain could be crystallized, homology models of the C-terminal domain were constructed based on ~36% sequence identity with the N-terminal domain. Small-angle X-ray scattering experiments yield low resolution molecular envelopes into which the N-terminal domain crystal structure and the C-terminal domain homology model were fit, suggesting possible αα domain architectures as frameworks for bifunctional catalysis. PMID:26598179

  17. Site directed recombination

    DOEpatents

    Jurka, Jerzy W.

    1997-01-01

    Enhanced homologous recombination is obtained by employing a consensus sequence which has been found to be associated with integration of repeat sequences, such as Alu and ID. The consensus sequence or sequence having a single transition mutation determines one site of a double break which allows for high efficiency of integration at the site. By introducing single or double stranded DNA having the consensus sequence flanking region joined to a sequence of interest, one can reproducibly direct integration of the sequence of interest at one or a limited number of sites. In this way, specific sites can be identified and homologous recombination achieved at the site by employing a second flanking sequence associated with a sequence proximal to the 3'-nick.

  18. Bacterial RecA Protein Promotes Adenoviral Recombination during In Vitro Infection

    PubMed Central

    Lee, Jeong Yoon; Lee, Ji Sun; Materne, Emma C.; Rajala, Rahul; Ismail, Ashrafali M.; Seto, Donald; Dyer, David W.

    2018-01-01

    ABSTRACT Adenovirus infections in humans are common and sometimes lethal. Adenovirus-derived vectors are also commonly chosen for gene therapy in human clinical trials. We have shown in previous work that homologous recombination between adenoviral genomes of human adenovirus species D (HAdV-D), the largest and fastest growing HAdV species, is responsible for the rapid evolution of this species. Because adenovirus infection initiates in mucosal epithelia, particularly at the gastrointestinal, respiratory, genitourinary, and ocular surfaces, we sought to determine a possible role for mucosal microbiota in adenovirus genome diversity. By analysis of known recombination hot spots across 38 human adenovirus genomes in species D (HAdV-D), we identified nucleotide sequence motifs similar to bacterial Chi sequences, which facilitate homologous recombination in the presence of bacterial Rec enzymes. These motifs, referred to here as ChiAD, were identified immediately 5′ to the sequence encoding penton base hypervariable loop 2, which expresses the arginine-glycine-aspartate moiety critical to adenoviral cellular entry. Coinfection with two HAdV-Ds in the presence of an Escherichia coli lysate increased recombination; this was blocked in a RecA mutant strain, E. coli DH5α, or upon RecA depletion. Recombination increased in the presence of E. coli lysate despite a general reduction in viral replication. RecA colocalized with viral DNA in HAdV-D-infected cell nuclei and was shown to bind specifically to ChiAD sequences. These results indicate that adenoviruses may repurpose bacterial recombination machinery, a sharing of evolutionary mechanisms across a diverse microbiota, and unique example of viral commensalism. IMPORTANCE Adenoviruses are common human mucosal pathogens of the gastrointestinal, respiratory, and genitourinary tracts and ocular surface. Here, we report finding Chi-like sequences in adenovirus recombination hot spots. Adenovirus coinfection in the presence of bacterial RecA protein facilitated homologous recombination between viruses. Genetic recombination led to evolution of an important external feature on the adenoviral capsid, namely, the penton base protein hypervariable loop 2, which contains the arginine-glycine-aspartic acid motif critical to viral internalization. We speculate that free Rec proteins present in gastrointestinal secretions upon bacterial cell death facilitate the evolution of human adenoviruses through homologous recombination, an example of viral commensalism and the complexity of virus-host interactions, including regional microbiota. PMID:29925671

  19. Rapid construction of insulated genetic circuits via synthetic sequence-guided isothermal assembly

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Torella, JP; Boehm, CR; Lienert, F

    2013-12-28

    In vitro recombination methods have enabled one-step construction of large DNA sequences from multiple parts. Although synthetic biological circuits can in principle be assembled in the same fashion, they typically contain repeated sequence elements such as standard promoters and terminators that interfere with homologous recombination. Here we use a computational approach to design synthetic, biologically inactive unique nucleotide sequences (UNSes) that facilitate accurate ordered assembly. Importantly, our designed UNSes make it possible to assemble parts with repeated terminator and insulator sequences, and thereby create insulated functional genetic circuits in bacteria and mammalian cells. Using UNS-guided assembly to construct repeating promoter-gene-terminatormore » parts, we systematically varied gene expression to optimize production of a deoxychromoviridans biosynthetic pathway in Escherichia coli. We then used this system to construct complex eukaryotic AND-logic gates for genomic integration into embryonic stem cells. Construction was performed by using a standardized series of UNS-bearing BioBrick-compatible vectors, which enable modular assembly and facilitate reuse of individual parts. UNS-guided isothermal assembly is broadly applicable to the construction and optimization of genetic circuits and particularly those requiring tight insulation, such as complex biosynthetic pathways, sensors, counters and logic gates.« less

  20. High-coverage methylation data of a gene model before and after DNA damage and homologous repair.

    PubMed

    Pezone, Antonio; Russo, Giusi; Tramontano, Alfonso; Florio, Ermanno; Scala, Giovanni; Landi, Rosaria; Zuchegna, Candida; Romano, Antonella; Chiariotti, Lorenzo; Muller, Mark T; Gottesman, Max E; Porcellini, Antonio; Avvedimento, Enrico V

    2017-04-11

    Genome-wide methylation analysis is limited by its low coverage and the inability to detect single variants below 10%. Quantitative analysis provides accurate information on the extent of methylation of single CpG dinucleotide, but it does not measure the actual polymorphism of the methylation profiles of single molecules. To understand the polymorphism of DNA methylation and to decode the methylation signatures before and after DNA damage and repair, we have deep sequenced in bisulfite-treated DNA a reporter gene undergoing site-specific DNA damage and homologous repair. In this paper, we provide information on the data generation, the rationale for the experiments and the type of assays used, such as cytofluorimetry and immunoblot data derived during a previous work published in Scientific Reports, describing the methylation and expression changes of a model gene (GFP) before and after formation of a double-strand break and repair by homologous-recombination or non-homologous-end-joining. These data provide: 1) a reference for the analysis of methylation polymorphism at selected loci in complex cell populations; 2) a platform and the tools to compare transcription and methylation profiles.

  1. High-coverage methylation data of a gene model before and after DNA damage and homologous repair

    PubMed Central

    Pezone, Antonio; Russo, Giusi; Tramontano, Alfonso; Florio, Ermanno; Scala, Giovanni; Landi, Rosaria; Zuchegna, Candida; Romano, Antonella; Chiariotti, Lorenzo; Muller, Mark T.; Gottesman, Max E.; Porcellini, Antonio; Avvedimento, Enrico V.

    2017-01-01

    Genome-wide methylation analysis is limited by its low coverage and the inability to detect single variants below 10%. Quantitative analysis provides accurate information on the extent of methylation of single CpG dinucleotide, but it does not measure the actual polymorphism of the methylation profiles of single molecules. To understand the polymorphism of DNA methylation and to decode the methylation signatures before and after DNA damage and repair, we have deep sequenced in bisulfite-treated DNA a reporter gene undergoing site-specific DNA damage and homologous repair. In this paper, we provide information on the data generation, the rationale for the experiments and the type of assays used, such as cytofluorimetry and immunoblot data derived during a previous work published in Scientific Reports, describing the methylation and expression changes of a model gene (GFP) before and after formation of a double-strand break and repair by homologous-recombination or non-homologous-end-joining. These data provide: 1) a reference for the analysis of methylation polymorphism at selected loci in complex cell populations; 2) a platform and the tools to compare transcription and methylation profiles. PMID:28398335

  2. Predicted secondary structure similarity in the absence of primary amino acid sequence homology: hepatitis B virus open reading frames.

    PubMed Central

    Schaeffer, E; Sninsky, J J

    1984-01-01

    Proteins that are related evolutionarily may have diverged at the level of primary amino acid sequence while maintaining similar secondary structures. Computer analysis has been used to compare the open reading frames of the hepatitis B virus to those of the woodchuck hepatitis virus at the level of amino acid sequence, and to predict the relative hydrophilic character and the secondary structure of putative polypeptides. Similarity is seen at the levels of relative hydrophilicity and secondary structure, in the absence of sequence homology. These data reinforce the proposal that these open reading frames encode viral proteins. Computer analysis of this type can be more generally used to establish structural similarities between proteins that do not share obvious sequence homology as well as to assess whether an open reading frame is fortuitous or codes for a protein. PMID:6585835

  3. Nucleotide sequence of the gene encoding the nitrogenase iron protein of Thiobacillus ferrooxidans

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pretorius, I.M.; Rawlings, D.E.; O'Neill, E.G.

    1987-01-01

    The DNA sequence was determined for the cloned Thiobacillus ferrooxidans nifH and part of the nifD genes. The DNA chains were radiolabeled with (..cap alpha..-/sup 32/P)dCTP (3000 Ci/mmol) or (..cap alpha..-/sup 35/S)dCTP (400 Ci/mmol). A putative T. ferrooxidans nifH promoter was identified whose sequences showed perfect consensus with those of the Klebsiella pneumoniae nif promoter. Two putative consensus upstream activator sequences were also identified. The amino acid sequence was deduced from the DNA sequence. In a comparison of nifH DNA sequences from T. ferrooxidans and eight other nitrogen-fixing microbes, a Rhizobium sp. isolated from Parasponia andersonii showed the greatest homologymore » (74%) and Clostridium pasteurianum (nifH1) showed the least homology (54%). In the comparison of the amino acid sequences of the Fe proteins, the Rhizobium sp. and Rhizobium japonicum showed the greatest homology (both 86%) and C. pasteurianum (nifH1 gene product) demonstrated the least homology (56%) to the T. ferrooxidans Fe protein.« less

  4. Structure and expression of the human thymocyte antigens CD1a, CD1b, and CD1c

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Martin, L.H.; Calabi, F.; Lefebvre, F.A.

    1987-12-01

    The CD1 human antigens are a family of at least three components, CD1a, CD1b, and CD1c, that are characteristic of the cortical stage of thymocyte maturation. CD1a was originally named HTA1 or T6 and thought to be the human equivalent of mouse Tla. The genes coding for all three have not been identified by transfection into mouse cells. The transfectants express the surface antigens that can then be recognized by the corresponding cluster of monoclonal antibodies used to define the three members of CD1. The full sequence of the genomic DNA is described for all three. The intron-exon structure ofmore » CD1a is deduced by comparison with a near-full-length cDNA clone. Similar structures are proposed for the other two, largely based on sequence homology. An unusually long 5'-untranslated exon (280 bases long) is highly conserved between the three genes, suggesting an important but unknown function. CD1c has a duplicated form of this exon that is thought to be spliced out. The major homology between the three antigens is in the ..beta../sub 2/-microglobulin-binding-domain. The general relatedness to major histocompatibility complex class I and class II molecules is significant but low, with no section of higher homology to mouse Tla.« less

  5. Structural changes in the BH3 domain of SOUL protein upon interaction with the anti-apoptotic protein Bcl-xL

    PubMed Central

    Ambrosi, Emmanuele; Capaldi, Stefano; Bovi, Michele; Saccomani, Gianmaria; Perduca, Massimiliano; Monaco, Hugo L.

    2011-01-01

    The SOUL protein is known to induce apoptosis by provoking the mitochondrial permeability transition, and a sequence homologous with the BH3 (Bcl-2 homology 3) domains has recently been identified in the protein, thus making it a potential new member of the BH3-only protein family. In the present study, we provide NMR, SPR (surface plasmon resonance) and crystallographic evidence that a peptide spanning residues 147–172 in SOUL interacts with the anti-apoptotic protein Bcl-xL. We have crystallized SOUL alone and the complex of its BH3 domain peptide with Bcl-xL, and solved their three-dimensional structures. The SOUL monomer is a single domain organized as a distorted β-barrel with eight anti-parallel strands and two α-helices. The BH3 domain extends across 15 residues at the end of the second helix and eight amino acids in the chain following it. There are important structural differences in the BH3 domain in the intact SOUL molecule and the same sequence bound to Bcl-xL. PMID:21639858

  6. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ryan, Q.C.

    There are two nonallelic human {gamma} globin genes located on the short arm of chromosome No. 11 in the order 5{prime}-{sup G}{sub {gamma}}-{sup A}{sub {gamma}}-3{prime}. Various modifications of the two {gamma} genes have been reported and include: deletions, triplications, quadruplications and recently a quintuplication. These are generally created by one or more unequal crossovers in the {gamma} globin gene regions on adjacent chromosomes. During the course of looking for a {gamma}{sup {degree}} thalassemia, which might be due to a crossover of looking for a {gamma} genes, two cases were found in the family W. Bgl II mapping studies showed amore » 5 kb deletion at the {gamma} gene loci in these individuals. The Bgl II fragment from the {gamma} gene loci of R.W. was cloned into the phage vector QR1. Phage mapping showed that two out of the three Pst I sites within the Bgl II fragment were missing which suggested that the crossover might have occurred within the {gamma} gene, possibly within the {gamma}IVS II region. Sequence analysis of the cloned fragment revealed an unusual sequence which had no sequence homology with the {gamma} gene region except for a small 264 bp region near the 3{prime} end. The orientation of the 264 bp fragment is inverted relative to homologous sequences in the {sup G}{sub {gamma}} and {sup A}{sub {gamma}} IVS II. The unusual sequence was computer analyzed for homology with every DNA sequence file in the EMBL database and GenBank and did not show any significant homologies to all the available DNA sequences except for the 264 bp {gamma}IVS II homology.« less

  7. Congenital hypothyroidism mutations affect common folding and trafficking in the α/β-hydrolase fold proteins

    PubMed Central

    De Jaco, Antonella; Dubi, Noga; Camp, Shelley; Taylor, Palmer

    2017-01-01

    The α/β-hydrolase fold superfamily of proteins is composed of structurally related members that, despite great diversity in their catalytic, recognition, adhesion and chaperone functions, share a common fold governed by homologous residues and conserved disulfide bridges. Non-synonymous single nucleotide polymorphisms within the α/β-hydrolase fold domain in various family members have been found for congenital endocrine, metabolic and nervous system disorders. By examining the amino acid sequence from the various proteins, mutations were found to be prevalent in conserved residues within the α/β-hydrolase fold of the homologous proteins. This is the case for the thyroglobulin mutations linked to congenital hypothyroidism. To address whether correct folding of the common domain is required for protein export, we inserted the thyroglobulin mutations at homologous positions in two correlated but simpler α/β-hydrolase fold proteins known to be exported to the cell surface: neuroligin3 and acetylcholinesterase. Here we show that these mutations in the cholinesterase homologous region alter the folding properties of the α/β-hydrolase fold domain, which are reflected in defects in protein trafficking, folding and function, and ultimately result in retention of the partially processed proteins in the endoplasmic reticulum. Accordingly, mutations at conserved residues may be transferred amongst homologous proteins to produce common processing defects despite disparate functions, protein complexity and tissue-specific expression of the homologous proteins. More importantly, a similar assembly of the α/β-hydrolase fold domain tertiary structure among homologous members of the superfamily is required for correct trafficking of the proteins to their final destination. PMID:23035660

  8. Detection of Bacillus anthracis DNA in Complex Soil and Air Samples Using Next-Generation Sequencing

    PubMed Central

    Be, Nicholas A.; Thissen, James B.; Gardner, Shea N.; McLoughlin, Kevin S.; Fofanov, Viacheslav Y.; Koshinsky, Heather; Ellingson, Sally R.; Brettin, Thomas S.; Jackson, Paul J.; Jaing, Crystal J.

    2013-01-01

    Bacillus anthracis is the potentially lethal etiologic agent of anthrax disease, and is a significant concern in the realm of biodefense. One of the cornerstones of an effective biodefense strategy is the ability to detect infectious agents with a high degree of sensitivity and specificity in the context of a complex sample background. The nature of the B. anthracis genome, however, renders specific detection difficult, due to close homology with B. cereus and B. thuringiensis. We therefore elected to determine the efficacy of next-generation sequencing analysis and microarrays for detection of B. anthracis in an environmental background. We applied next-generation sequencing to titrated genome copy numbers of B. anthracis in the presence of background nucleic acid extracted from aerosol and soil samples. We found next-generation sequencing to be capable of detecting as few as 10 genomic equivalents of B. anthracis DNA per nanogram of background nucleic acid. Detection was accomplished by mapping reads to either a defined subset of reference genomes or to the full GenBank database. Moreover, sequence data obtained from B. anthracis could be reliably distinguished from sequence data mapping to either B. cereus or B. thuringiensis. We also demonstrated the efficacy of a microbial census microarray in detecting B. anthracis in the same samples, representing a cost-effective and high-throughput approach, complementary to next-generation sequencing. Our results, in combination with the capacity of sequencing for providing insights into the genomic characteristics of complex and novel organisms, suggest that these platforms should be considered important components of a biosurveillance strategy. PMID:24039948

  9. Prostate Cell Specific Regulation of Androgen Receptor Phosphorylation in Vivo

    DTIC Science & Technology

    2009-11-01

    includes both Rpb5, a subunit shared by RNA polymerase (Pol) I, II , and III, and the corepressor, Unconventional prefoldin Rpb5-Interactor (URI/C19orf2...complex that contains RNA polymerase II subunit 5, a subunit shared by all three RNA polymerases; unconventional prefoldin RPB5-in- teractor (URI), which...sequence of ART-27 is conserved throughout evolution from worms to humans and its predicted protein structure is homologous to the prefoldin -a family of

  10. Homology modeling a fast tool for drug discovery: current perspectives.

    PubMed

    Vyas, V K; Ukawala, R D; Ghate, M; Chintha, C

    2012-01-01

    Major goal of structural biology involve formation of protein-ligand complexes; in which the protein molecules act energetically in the course of binding. Therefore, perceptive of protein-ligand interaction will be very important for structure based drug design. Lack of knowledge of 3D structures has hindered efforts to understand the binding specificities of ligands with protein. With increasing in modeling software and the growing number of known protein structures, homology modeling is rapidly becoming the method of choice for obtaining 3D coordinates of proteins. Homology modeling is a representation of the similarity of environmental residues at topologically corresponding positions in the reference proteins. In the absence of experimental data, model building on the basis of a known 3D structure of a homologous protein is at present the only reliable method to obtain the structural information. Knowledge of the 3D structures of proteins provides invaluable insights into the molecular basis of their functions. The recent advances in homology modeling, particularly in detecting and aligning sequences with template structures, distant homologues, modeling of loops and side chains as well as detecting errors in a model contributed to consistent prediction of protein structure, which was not possible even several years ago. This review focused on the features and a role of homology modeling in predicting protein structure and described current developments in this field with victorious applications at the different stages of the drug design and discovery.

  11. Homology Modeling a Fast Tool for Drug Discovery: Current Perspectives

    PubMed Central

    Vyas, V. K.; Ukawala, R. D.; Ghate, M.; Chintha, C.

    2012-01-01

    Major goal of structural biology involve formation of protein-ligand complexes; in which the protein molecules act energetically in the course of binding. Therefore, perceptive of protein-ligand interaction will be very important for structure based drug design. Lack of knowledge of 3D structures has hindered efforts to understand the binding specificities of ligands with protein. With increasing in modeling software and the growing number of known protein structures, homology modeling is rapidly becoming the method of choice for obtaining 3D coordinates of proteins. Homology modeling is a representation of the similarity of environmental residues at topologically corresponding positions in the reference proteins. In the absence of experimental data, model building on the basis of a known 3D structure of a homologous protein is at present the only reliable method to obtain the structural information. Knowledge of the 3D structures of proteins provides invaluable insights into the molecular basis of their functions. The recent advances in homology modeling, particularly in detecting and aligning sequences with template structures, distant homologues, modeling of loops and side chains as well as detecting errors in a model contributed to consistent prediction of protein structure, which was not possible even several years ago. This review focused on the features and a role of homology modeling in predicting protein structure and described current developments in this field with victorious applications at the different stages of the drug design and discovery. PMID:23204616

  12. Single-cell template strand sequencing by Strand-seq enables the characterization of individual homologs.

    PubMed

    Sanders, Ashley D; Falconer, Ester; Hills, Mark; Spierings, Diana C J; Lansdorp, Peter M

    2017-06-01

    The ability to distinguish between genome sequences of homologous chromosomes in single cells is important for studies of copy-neutral genomic rearrangements (such as inversions and translocations), building chromosome-length haplotypes, refining genome assemblies, mapping sister chromatid exchange events and exploring cellular heterogeneity. Strand-seq is a single-cell sequencing technology that resolves the individual homologs within a cell by restricting sequence analysis to the DNA template strands used during DNA replication. This protocol, which takes up to 4 d to complete, relies on the directionality of DNA, in which each single strand of a DNA molecule is distinguished based on its 5'-3' orientation. Culturing cells in a thymidine analog for one round of cell division labels nascent DNA strands, allowing for their selective removal during genomic library construction. To preserve directionality of template strands, genomic preamplification is bypassed and labeled nascent strands are nicked and not amplified during library preparation. Each single-cell library is multiplexed for pooling and sequencing, and the resulting sequence data are aligned, mapping to either the minus or plus strand of the reference genome, to assign template strand states for each chromosome in the cell. The major adaptations to conventional single-cell sequencing protocols include harvesting of daughter cells after a single round of BrdU incorporation, bypassing of whole-genome amplification, and removal of the BrdU + strand during Strand-seq library preparation. By sequencing just template strands, the structure and identity of each homolog are preserved.

  13. Homologous Recombination—Experimental Systems, Analysis and Significance

    PubMed Central

    Kuzminov, Andrei

    2014-01-01

    Homologous recombination is the most complex of all recombination events that shape genomes and produce material for evolution. Homologous recombination events are exchanges between DNA molecules in the lengthy regions of shared identity, catalyzed by a group of dedicated enzymes. There is a variety of experimental systems in E. coli and Salmonella to detect homologous recombination events of several different kinds. Genetic analysis of homologous recombination reveals three separate phases of this process: pre-synapsis (the early phase), synapsis (homologous strand exchange) and post-synapsis (the late phase). In E. coli, there are at least two independent pathway of the early phase and at least two independent pathways of the late phase. All this complexity is incongruent with the originally ascribed role of homologous recombination as accelerator of genome evolution: there is simply not enough duplication and repetition in enterobacterial genomes for homologous recombination to have a detectable evolutionary role, and therefore not enough selection to maintain such a complexity. At the same time, the mechanisms of homologous recombination are uniquely suited for repair of complex DNA lesions called chromosomal lesions. In fact, the two major classes of chromosomal lesions are recognized and processed by the two individual pathways at the early phase of homologous recombination. It follows, therefore, that homologous recombination events are occasional reflections of the continual recombinational repair, made possible in cases of natural or artificial genome redundancy. PMID:26442506

  14. Homologation and functionalization of carbon monoxide by a recyclable uranium complex.

    PubMed

    Gardner, Benedict M; Stewart, John C; Davis, Adrienne L; McMaster, Jonathan; Lewis, William; Blake, Alexander J; Liddle, Stephen T

    2012-06-12

    Carbon monoxide (CO) is in principle an excellent resource from which to produce industrial hydrocarbon feedstocks as alternatives to crude oil; however, CO has proven remarkably resistant to selective homologation, and the few complexes that can effect this transformation cannot be recycled because liberation of the homologated product destroys the complexes or they are substitutionally inert. Here, we show that under mild conditions a simple triamidoamine uranium(III) complex can reductively homologate CO and be recycled for reuse. Following treatment with organosilyl halides, bis(organosiloxy)acetylenes, which readily convert to furanones, are produced, and this was confirmed by the use of isotopically (13)C-labeled CO. The precursor to the triamido uranium(III) complex is formed concomitantly. These findings establish that, under appropriate conditions, uranium(III) can mediate a complete synthetic cycle for the homologation of CO to higher derivatives. This work may prove useful in spurring wider efforts in CO homologation, and the simplicity of this system suggests that catalytic CO functionalization may soon be within reach.

  15. Homologation and functionalization of carbon monoxide by a recyclable uranium complex

    PubMed Central

    Gardner, Benedict M.; Stewart, John C.; Davis, Adrienne L.; McMaster, Jonathan; Lewis, William; Blake, Alexander J.; Liddle, Stephen T.

    2012-01-01

    Carbon monoxide (CO) is in principle an excellent resource from which to produce industrial hydrocarbon feedstocks as alternatives to crude oil; however, CO has proven remarkably resistant to selective homologation, and the few complexes that can effect this transformation cannot be recycled because liberation of the homologated product destroys the complexes or they are substitutionally inert. Here, we show that under mild conditions a simple triamidoamine uranium(III) complex can reductively homologate CO and be recycled for reuse. Following treatment with organosilyl halides, bis(organosiloxy)acetylenes, which readily convert to furanones, are produced, and this was confirmed by the use of isotopically 13C-labeled CO. The precursor to the triamido uranium(III) complex is formed concomitantly. These findings establish that, under appropriate conditions, uranium(III) can mediate a complete synthetic cycle for the homologation of CO to higher derivatives. This work may prove useful in spurring wider efforts in CO homologation, and the simplicity of this system suggests that catalytic CO functionalization may soon be within reach. PMID:22652572

  16. Major histocompatibility complex variation in the endangered Przewalski's horse.

    PubMed Central

    Hedrick, P W; Parker, K M; Miller, E L; Miller, P S

    1999-01-01

    The major histocompatibility complex (MHC) is a fundamental part of the vertebrate immune system, and the high variability in many MHC genes is thought to play an essential role in recognition of parasites. The Przewalski's horse is extinct in the wild and all the living individuals descend from 13 founders, most of whom were captured around the turn of the century. One of the primary genetic concerns in endangered species is whether they have ample adaptive variation to respond to novel selective factors. In examining 14 Przewalski's horses that are broadly representative of the living animals, we found six different class II DRB major histocompatibility sequences. The sequences showed extensive nonsynonymous variation, concentrated in the putative antigen-binding sites, and little synonymous variation. Individuals had from two to four sequences as determined by single-stranded conformation polymorphism (SSCP) analysis. On the basis of the SSCP data, phylogenetic analysis of the nucleotide sequences, and segregation in a family group, we conclude that four of these sequences are from one gene (although one sequence codes for a nonfunctional allele because it contains a stop codon) and two other sequences are from another gene. The position of the stop codon is at the same amino-acid position as in a closely related sequence from the domestic horse. Because other organisms have extensive variation at homologous loci, the Przewalski's horse may have quite low variation in this important adaptive region. PMID:10430594

  17. Chemical property based sequence characterization of PpcA and its homolog proteins PpcB-E: A mathematical approach

    PubMed Central

    Pal Choudhury, Pabitra

    2017-01-01

    Periplasmic c7 type cytochrome A (PpcA) protein is determined in Geobacter sulfurreducens along with its other four homologs (PpcB-E). From the crystal structure viewpoint the observation emerges that PpcA protein can bind with Deoxycholate (DXCA), while its other homologs do not. But it is yet to be established with certainty the reason behind this from primary protein sequence information. This study is primarily based on primary protein sequence analysis through the chemical basis of embedded amino acids. Firstly, we look for the chemical group specific score of amino acids. Along with this, we have developed a new methodology for the phylogenetic analysis based on chemical group dissimilarities of amino acids. This new methodology is applied to the cytochrome c7 family members and pinpoint how a particular sequence is differing with others. Secondly, we build a graph theoretic model on using amino acid sequences which is also applied to the cytochrome c7 family members and some unique characteristics and their domains are highlighted. Thirdly, we search for unique patterns as subsequences which are common among the group or specific individual member. In all the cases, we are able to show some distinct features of PpcA that emerges PpcA as an outstanding protein compared to its other homologs, resulting towards its binding with deoxycholate. Similarly, some notable features for the structurally dissimilar protein PpcD compared to the other homologs are also brought out. Further, the five members of cytochrome family being homolog proteins, they must have some common significant features which are also enumerated in this study. PMID:28362850

  18. Purification and partial characterization of a lectin protein complex, the clathrilectin, from the calcareous sponge Clathrina clathrus.

    PubMed

    Gardères, Johan; Domart-Coulon, Isabelle; Marie, Arul; Hamer, Bojan; Batel, Renato; Müller, Werner E G; Bourguet-Kondracki, Marie-Lise

    2016-10-01

    Carbohydrate-binding proteins were purified from the marine calcareous sponge Clathrina clathrus via affinity chromatography on lactose and N-acetyl glucosamine-agarose resins. Proteomic analysis of acrylamide gel separated protein subunits obtained in reducing conditions pointed out several candidates for lectins. Based on amino-acid sequence similarity, two peptides displayed homology with the jack bean lectin Concanavalin A, including a conserved domain shared by proteins in the L-type lectin superfamily. An N-acetyl glucosamine - binding protein complex, named clathrilectin, was further purified via gel filtration chromatography, bioguided with a diagnostic rabbit erythrocyte haemagglutination assay, and its activity was found to be calcium dependent. Clathrilectin, a protein complex of 3200kDa estimated by gel filtration, is composed of monomers with apparent molecular masses of 208 and 180kDa estimated on 10% SDS-PAGE. Nine internal peptides were identified using proteomic analyses, and compared to protein libraries from the demosponge Amphimedon queenslandica and a calcareous sponge Sycon sp. from the Adriatic Sea. The clathrilectin is the first lectin isolated from a calcareous sponge and displays homologies with predicted sponge proteins potentially involved in cell aggregation and interaction with bacteria. Copyright © 2016 Elsevier Inc. All rights reserved.

  19. Homologous recombination within the capsid gene of porcine circovirus type 2 subgroup viruses via natural co-infection

    USDA-ARS?s Scientific Manuscript database

    Several studies had reported homologous recombination between porcine circovirus type 2 (PCV2)-group 1 (Gp1) and -group 2 (Gp2) viruses. Interestingly, the recombination events described thus far mapped either within the Rep gene sequences or the sequences flanking the Rep gene region. Previously, ...

  20. Homology and the optimization of DNA sequence data

    NASA Technical Reports Server (NTRS)

    Wheeler, W.

    2001-01-01

    Three methods of nucleotide character analysis are discussed. Their implications for molecular sequence homology and phylogenetic analysis are compared. The criterion of inter-data set congruence, both character based and topological, are applied to two data sets to elucidate and potentially discriminate among these parsimony-based ideas. c2001 The Willi Hennig Society.

  1. Establishing homologies in protein sequences

    NASA Technical Reports Server (NTRS)

    Dayhoff, M. O.; Barker, W. C.; Hunt, L. T.

    1983-01-01

    Computer-based statistical techniques used to determine homologies between proteins occurring in different species are reviewed. The technique is based on comparison of two protein sequences, either by relating all segments of a given length in one sequence to all segments of the second or by finding the best alignment of the two sequences. Approaches discussed include selection using printed tabulations, identification of very similar sequences, and computer searches of a database. The use of the SEARCH, RELATE, and ALIGN programs (Dayhoff, 1979) is explained; sample data are presented in graphs, diagrams, and tables and the construction of scoring matrices is considered.

  2. Complete genome sequence analysis of a duck circovirus from Guangxi pockmark ducks.

    PubMed

    Xie, Liji; Xie, Zhixun; Zhao, Guangyuan; Liu, Jiabo; Pang, Yaoshan; Deng, Xianwen; Xie, Zhiqin; Fan, Qing

    2012-12-01

    We report here the complete genomic sequence of a novel duck circovirus (DuCV) strain, GX1104, isolated from Guangxi pockmark ducks in Guangxi, China. The whole nucleotide sequence had the highest homology (97.2%) with the sequence of strain TC/2002 (GenBank accession number AY394721.1) and had a low homology (76.8% to 78.6%) with the sequences of other strains isolated from China, Germany, and the United States. This report will help to understand the epidemiology and molecular characteristics of Guangxi pockmark duck circovirus in southern China.

  3. Invariant glycines and prolines flanking in loops the strand beta 2 of various (alpha/beta)8-barrel enzymes: a hidden homology?

    PubMed Central

    Janecek, S.

    1996-01-01

    The question of parallel (alpha/beta)8-barrel fold evolution remains unclear, owing mainly to the lack of sequence homology throughout the amino acid sequences of (alpha/beta)8-barrel enzymes. The "classical" approaches used in the search for homologies among (alpha/beta)8-barrels (e.g., production of structurally based alignments) have yielded alignments perfect from the structural point of view, but the approaches have been unable to reveal the homologies. These are proposed to be "hidden" in (alpha/beta)8-barrel enzymes. The term "hidden homology" means that the alignment of sequence stretches proposed to be homologous need not be structurally fully satisfactory. This is due to the very long evolutionary history of all (alpha/beta)8-barrels. This work identifies so-called hidden homology around the strand beta 2 that is flanked by loops containing invariant glycines and prolines in 17 different (alpha/beta)8-barrel enzymes, i.e., roughly in half of all currently known (alpha/beta)8-barrel proteins. The search was based on the idea that a conserved sequence region of an (alpha/beta)8-barrel enzyme should be more or less conserved also in the equivalent part of the structure of the other enzymes with this folding motif, given their mutual evolutionary relatedness. For this purpose, the sequence region around the well-conserved second beta-strand of alpha-amylase flanked by the invariant glycine and proline (56_GFTAIWITP, Aspergillus oryzae alpha-amylase numbering), was used as the sequence-structural template. The proposal that the second beta-strand of (alpha/beta)8-barrel fold is important from the evolutionary point of view is strongly supported by the increasing trend of the observed beta 2-strand structural similarity for the pairs of (alpha/beta)8-barrel enzymes: alpha-amylase and the alpha-subunit of tryptophan synthase, alpha-amylase and mandelate racemase, and alpha-amylase and cyclodextrin glycosyltransferase. This trend is also in agreement with the existing evolutionary division of the entire family of (alpha/beta)8-barrel proteins. PMID:8762144

  4. Invariant glycines and prolines flanking in loops the strand beta 2 of various (alpha/beta)8-barrel enzymes: a hidden homology?

    PubMed

    Janecek, S

    1996-06-01

    The question of parallel (alpha/beta)8-barrel fold evolution remains unclear, owing mainly to the lack of sequence homology throughout the amino acid sequences of (alpha/beta)8-barrel enzymes. The "classical" approaches used in the search for homologies among (alpha/beta)8-barrels (e.g., production of structurally based alignments) have yielded alignments perfect from the structural point of view, but the approaches have been unable to reveal the homologies. These are proposed to be "hidden" in (alpha/beta)8-barrel enzymes. The term "hidden homology" means that the alignment of sequence stretches proposed to be homologous need not be structurally fully satisfactory. This is due to the very long evolutionary history of all (alpha/beta)8-barrels. This work identifies so-called hidden homology around the strand beta 2 that is flanked by loops containing invariant glycines and prolines in 17 different (alpha/beta)8-barrel enzymes, i.e., roughly in half of all currently known (alpha/beta)8-barrel proteins. The search was based on the idea that a conserved sequence region of an (alpha/beta)8-barrel enzyme should be more or less conserved also in the equivalent part of the structure of the other enzymes with this folding motif, given their mutual evolutionary relatedness. For this purpose, the sequence region around the well-conserved second beta-strand of alpha-amylase flanked by the invariant glycine and proline (56_GFTAIWITP, Aspergillus oryzae alpha-amylase numbering), was used as the sequence-structural template. The proposal that the second beta-strand of (alpha/beta)8-barrel fold is important from the evolutionary point of view is strongly supported by the increasing trend of the observed beta 2-strand structural similarity for the pairs of (alpha/beta)8-barrel enzymes: alpha-amylase and the alpha-subunit of tryptophan synthase, alpha-amylase and mandelate racemase, and alpha-amylase and cyclodextrin glycosyltransferase. This trend is also in agreement with the existing evolutionary division of the entire family of (alpha/beta)8-barrel proteins.

  5. A Symplectic Instanton Homology via Traceless Character Varieties

    NASA Astrophysics Data System (ADS)

    Horton, Henry T.

    Since its inception, Floer homology has been an important tool in low-dimensional topology. Floer theoretic invariants of 3-manifolds tend to be either gauge theoretic or symplecto-geometric in nature, and there is a general philosophy that each gauge theoretic Floer homology should have a corresponding symplectic Floer homology and vice-versa. In this thesis, we construct a Lagrangian Floer invariant for any closed, oriented 3-manifold Y (called the symplectic instanton homology of Y and denoted SI(Y)) which is conjecturally equivalent to a Floer homology defined using a certain variant of Yang-Mills gauge theory. The crucial ingredient for defining SI( Y) is the use of traceless character varieties in the symplectic setting, which allow us to avoid the debilitating technical hurdles present when one attempts to define a symplectic version of instanton Floer homologies. Floer theories are also expected to roughly satisfy the axioms of a topological quantum field theory (TQFT), and furthermore Dehn surgeries on knots should induce exact triangles of Floer homologies. Following a strategy used by Ozsvath and Szabo in the context of Heegaard Floer homology, we prove that our theory is functorial with respect to connected 4-dimensional cobordisms, so that cobordisms induce homomorphisms between symplectic instanton homologies. By studying the effect of Dehn surgeries on traceless character varieties, we establish a surgery exact triangle using work of Seidel that relates the geometry of Lefschetz fibrations with exact triangles in Lagrangian Floer theory. We further prove that Dehn surgeries on a link L in a 3-manifold Y induce a spectral sequence of symplectic instanton homologies - the E2-page is isomorphic to a direct sum of symplectic instanton homologies of all possible combinations of 0- and 1-surgeries on the components of L, and the spectral sequence converges to SI(Y). For the branched double cover Sigma(L) of a link L in S3, we show there is a link surgery spectral sequence whose E 2-page is isomorphic to the reduced Khovanov homology of L and which converges to the symplectic instanton homology of Sigma( L).

  6. Comparative analysis of ribosomal protein L5 sequences from bacteria of the genus Thermus.

    PubMed

    Jahn, O; Hartmann, R K; Boeckh, T; Erdmann, V A

    1991-06-01

    The genes for the ribosomal 5S rRNA binding protein L5 have been cloned from three extremely thermophilic eubacteria, Thermus flavus, Thermus thermophilus HB8 and Thermus aquaticus (Jahn et al, submitted). Genes for protein L5 from the three Thermus strains display 95% G/C in third positions of codons. Amino acid sequences deduced from the DNA sequence were shown to be identical for T flavus and T thermophilus, although the corresponding DNA sequences differed by two T to C transitions in the T thermophilus gene. Protein L5 sequences from T flavus and T thermophilus are 95% homologous to L5 from T aquaticus and 56.5% homologous to the corresponding E coli sequence. The lowest degrees of homology were found between the T flavus/T thermophilus L5 proteins and those of yeast L16 (27.5%), Halobacterium marismortui (34.0%) and Methanococcus vannielii (36.6%). From sequence comparison it becomes clear that thermostability of Thermus L5 proteins is achieved by an increase in hydrophobic interactions and/or by restriction of steric flexibility due to the introduction of amino acids with branched aliphatic side chains such as leucine. Alignment of the nine protein sequences equivalent to Thermus L5 proteins led to identification of a conserved internal segment, rich in acidic amino acids, which shows homology to subsequences of E coli L18 and L25. The occurrence of conserved sequence elements in 5S rRNA binding proteins and ribosomal proteins in general is discussed in terms of evolution and function.

  7. Gene Discovery through Genomic Sequencing of Brucella abortus

    PubMed Central

    Sánchez, Daniel O.; Zandomeni, Ruben O.; Cravero, Silvio; Verdún, Ramiro E.; Pierrou, Ester; Faccio, Paula; Diaz, Gabriela; Lanzavecchia, Silvia; Agüero, Fernán; Frasch, Alberto C. C.; Andersson, Siv G. E.; Rossetti, Osvaldo L.; Grau, Oscar; Ugalde, Rodolfo A.

    2001-01-01

    Brucella abortus is the etiological agent of brucellosis, a disease that affects bovines and human. We generated DNA random sequences from the genome of B. abortus strain 2308 in order to characterize molecular targets that might be useful for developing immunological or chemotherapeutic strategies against this pathogen. The partial sequencing of 1,899 clones allowed the identification of 1,199 genomic sequence surveys (GSSs) with high homology (BLAST expect value < 10−5) to sequences deposited in the GenBank databases. Among them, 925 represent putative novel genes for the Brucella genus. Out of 925 nonredundant GSSs, 470 were classified in 15 categories based on cellular function. Seven hundred GSSs showed no significant database matches and remain available for further studies in order to identify their function. A high number of GSSs with homology to Agrobacterium tumefaciens and Rhizobium meliloti proteins were observed, thus confirming their close phylogenetic relationship. Among them, several GSSs showed high similarity with genes related to nodule nitrogen fixation, synthesis of nod factors, nodulation protein symbiotic plasmid, and nodule bacteroid differentiation. We have also identified several B. abortus homologs of virulence and pathogenesis genes from other pathogens, including a homolog to both the Shda gene from Salmonella enterica serovar Typhimurium and the AidA-1 gene from Escherichia coli. Other GSSs displayed significant homologies to genes encoding components of the type III and type IV secretion machineries, suggesting that Brucella might also have an active type III secretion machinery. PMID:11159979

  8. Combined sequence and structure analysis of the fungal laccase family.

    PubMed

    Kumar, S V Suresh; Phale, Prashant S; Durani, S; Wangikar, Pramod P

    2003-08-20

    Plant and fungal laccases belong to the family of multi-copper oxidases and show much broader substrate specificity than other members of the family. Laccases have consequently been of interest for potential industrial applications. We have analyzed the essential sequence features of fungal laccases based on multiple sequence alignments of more than 100 laccases. This has resulted in identification of a set of four ungapped sequence regions, L1-L4, as the overall signature sequences that can be used to identify the laccases, distinguishing them within the broader class of multi-copper oxidases. The 12 amino acid residues in the enzymes serving as the copper ligands are housed within these four identified conserved regions, of which L2 and L4 conform to the earlier reported copper signature sequences of multi-copper oxidases while L1 and L3 are distinctive to the laccases. The mapping of regions L1-L4 on to the three-dimensional structure of the Coprinus cinerius laccase indicates that many of the non-copper-ligating residues of the conserved regions could be critical in maintaining a specific, more or less C-2 symmetric, protein conformational motif characterizing the active site apparatus of the enzymes. The observed intraprotein homologies between L1 and L3 and between L2 and L4 at both the structure and the sequence levels suggest that the quasi C-2 symmetric active site conformational motif may have arisen from a structural duplication event that neither the sequence homology analysis nor the structure homology analysis alone would have unraveled. Although the sequence and structure homology is not detectable in the rest of the protein, the relative orientation of region L1 with L2 is similar to that of L3 with L4. The structure duplication of first-shell and second-shell residues has become cryptic because the intraprotein sequence homology noticeable for a given laccase becomes significant only after comparing the conservation pattern in several fungal laccases. The identified motifs, L1-L4, can be useful in searching the newly sequenced genomes for putative laccase enzymes. Copyright 2003 Wiley Periodicals, Inc. Biotechnol Bioeng 83: 386-394, 2003.

  9. Super elongation complex contains a TFIIF-related subcomplex

    PubMed Central

    Knutson, Bruce A.; Smith, Marissa L.; Walker-Kopp, Nancy; Xu, Xia

    2016-01-01

    ABSTRACT Super elongation complex (SEC) belongs to a family of RNA polymerase II (Pol II) elongation factors that has similar properties as TFIIF, a general transcription factor that increases the transcription elongation rate by reducing pausing. Although SEC has TFIIF-like functional properties, it apparently lacks sequence and structural homology. Using HHpred, we find that SEC contains an evolutionarily related TFIIF-like subcomplex. We show that the SEC subunit ELL interacts with the Pol II Rbp2 subunit, as expected for a TFIIF-like factor. These findings suggest a new model for how SEC functions as a Pol II elongation factor and how it suppresses Pol II pausing. PMID:27223670

  10. Structure and function of three novel MHC class I antigens derived from a C3H ultraviolet-induced fibrosarcoma

    PubMed Central

    1986-01-01

    The UV-induced, C3H fibrosarcoma, 1591, expresses at least three unique MHC class I antigens not found on normal C3H tissue. Here we report the complete DNA sequence of the three novel class I genes encoding these molecules, and describe in detail the recognition of the individual products by tumor-reactive and allospecific CTL. Remarkably, although C3H does not appear to express H-2L locus information, this C3H tumor expresses two distinct antigens, termed A149 and A166, which are extremely homologous to each other and to the H-2Ld antigen from BALB/c. The gene encoding the third novel class I antigen from 1591, A216, is quite homologous to H-2Kk) throughout its 3' end. Since all three of these genes account for polymorphic restriction fragments not found in C3H, it is likely that they were derived by recombination from the endogenous class I genes of C3H. The DNA sequence homology of A149, A166, and H-2Ld is especially significant given the functional conservation observed between the products of these genes. Limited sequence substitutions appear to correlate with some of the discrete serological differences observed between these molecules. In addition, both A149 and A166 crossreact, but to differing extents, with H-2Ld at the level of T cell recognition. Our results are consistent with the view that CTL recognize complex conformational determinants on class I molecules, but extend previous observations by comparing a set of antigens with discrete and overlapping structural and functional differences. PMID:3489061

  11. Is the activity of CGRP and Adrenomedullin regulated by RAMP (-2) and (-3) in Trypanosomatidae? An in-silico approach.

    PubMed

    Febres, Anthony; Vanegas, Oriana; Giammarresi, Michelle; Gomes, Carlos; Díaz, Emilia; Ponte-Sucre, Alicia

    2018-07-01

    The Calcitonin-Like Receptor (CLR) belongs to the classical seven-transmembrane segment molecules coupled to heterotrimeric G proteins. Its pharmacology depends on the simultaneous expression of the so-called Receptor Activity Modifier Proteins (RAMP-) -1, -2 and -3. RAMP-associated proteins modulate glycosylation and cellular traffic of CLR, therefore determining its pharmacodynamics. In higher eukaryotes, the complex formed by CLR and RAMP-1 is more akin to bind Calcitonin Gene-Related Peptide (CGRP), whereas those formed by CLR and RAMP-2 or RAMP-3, bind preferentially Adrenomedullin (AM). In lower eukaryotes, RAMPs, or any homologous protein, have not been identified until now. Herein we demonstrated a negative chemotactic response elicited by CGRP (10 -9 and 10 -8  M) and AM (10 -9 to 10 -5  M). Whether or not this response is receptor mediated should be verified, as well as the expression of a 24 kDa band in Leishmania, recognized by western blot analysis by the use of (human-)-RAMP-2 antibodies as detection probes. Queries with human RAMP-2 and RAMP-3 protein sequences in blastp against Leishmania (Viannia) braziliensis predicted proteome, allowed us to detect two sequence alignments in the parasite: A RAMP-2-aligned sequence corresponding to Leishmania folylpolyglutamate synthase (FPGS), and a RAMP-3 aligned protein, a hypothetical Leishmania protein with yet unknown function. The presence of homologous of these proteins was described in-silico in other members of the Trypanosomatidae. These preliminary and not yet complete data suggest the feasibility that both CGRP and Adrenomedullin activities may be regulated by homologs of RAMP- (-2) and (-3) in these parasites. Copyright © 2018 Elsevier B.V. All rights reserved.

  12. The VP35 and VP40 proteins of filoviruses. Homology between Marburg and Ebola viruses.

    PubMed

    Bukreyev, A A; Volchkov, V E; Blinov, V M; Netesov, S V

    1993-05-03

    The fragments of genomic RNA sequences of Marburg (MBG) and Ebola (EBO) viruses are reported. These fragments were found to encode the VP35 and VP40 proteins. The canonic sequences were revealed before and after each open reading frame. It is suggested that these sequences are mRNA extremities and at the same time the regulatory elements for mRNA transcription. Homology between the MBG and EBO proteins was discovered.

  13. Bhageerath-H: A homology/ab initio hybrid server for predicting tertiary structures of monomeric soluble proteins

    PubMed Central

    2014-01-01

    Background The advent of human genome sequencing project has led to a spurt in the number of protein sequences in the databanks. Success of structure based drug discovery severely hinges on the availability of structures. Despite significant progresses in the area of experimental protein structure determination, the sequence-structure gap is continually widening. Data driven homology based computational methods have proved successful in predicting tertiary structures for sequences sharing medium to high sequence similarities. With dwindling similarities of query sequences, advanced homology/ ab initio hybrid approaches are being explored to solve structure prediction problem. Here we describe Bhageerath-H, a homology/ ab initio hybrid software/server for predicting protein tertiary structures with advancing drug design attempts as one of the goals. Results Bhageerath-H web-server was validated on 75 CASP10 targets which showed TM-scores ≥0.5 in 91% of the cases and Cα RMSDs ≤5Å from the native in 58% of the targets, which is well above the CASP10 water mark. Comparison with some leading servers demonstrated the uniqueness of the hybrid methodology in effectively sampling conformational space, scoring best decoys and refining low resolution models to high and medium resolution. Conclusion Bhageerath-H methodology is web enabled for the scientific community as a freely accessible web server. The methodology is fielded in the on-going CASP11 experiment. PMID:25521245

  14. Partial mapping and sequencing of a fish iridovirus genome reveals genes homologous to the frog virus 3 p31, p40 and human eIF2alpha.

    PubMed

    Yu, Y X; Béarzotti, M; Vende, P; Ahne, W; Brémont, M

    1999-09-01

    Iridovirus-like pathogens have been recognized as a cause of serious systemic diseases among feral, cultured and ornamental fish in the recent years. Mortalities of fish due to systemic iridovirus infection reaching 30-100% were observed in Europe, Australia, Japan and Thailand. Up to now, the molecular biology of these important pathogens has been poorly documented. To get better insights on the genomic organization of these piscine iridoviruses, we have constructed a cosmid viral DNA library from the epizootic hematopoietic necrosis virus (EHNV). Two recombinant cosmids (Cos7 and Cos12) have been selected for systematic sequencing. Cos7 and 12 are localized side by side along the genome and cover the 2/3 part of the total EHNV genome which has been estimated to be approximately 101.47 kb in length. Thirty five kilobase pairs (kbps) from Cos7 and 10 kbps from Cos12 have been determined. Sequence analysis revealed open reading frames (ORF) sharing homologies with sequences from the Frog virus 3 such as the p31 and p40 proteins. Among the others identified ORFs, some of them presented homologies with known protein sequences, such as the human eIF2alpha protein, and some did not show any significant homologies with sequences available in the databases. But, none were related to Lymphocystis virus, a member of the Iridoviridae family, for which the full genome nucleotide sequence has been determined.

  15. Incorrectly predicted genes in rice?

    PubMed

    Cruveiller, Stéphane; Jabbari, Kamel; Clay, Oliver; Bernardi, Giorgio

    2004-05-26

    Between one third and one half of the proposed rice genes appear to have no homologs in other species, including Arabidopsis. Compositional considerations, and a comparison of curated rice sequences with ex novo predictions, suggest that many or most of the putative genes without homologs may be false positive predictions, i.e., sequences that are never translated into functional proteins in vivo.

  16. Homology and phylogeny and their automated inference

    NASA Astrophysics Data System (ADS)

    Fuellen, Georg

    2008-06-01

    The analysis of the ever-increasing amount of biological and biomedical data can be pushed forward by comparing the data within and among species. For example, an integrative analysis of data from the genome sequencing projects for various species traces the evolution of the genomes and identifies conserved and innovative parts. Here, I review the foundations and advantages of this “historical” approach and evaluate recent attempts at automating such analyses. Biological data is comparable if a common origin exists (homology), as is the case for members of a gene family originating via duplication of an ancestral gene. If the family has relatives in other species, we can assume that the ancestral gene was present in the ancestral species from which all the other species evolved. In particular, describing the relationships among the duplicated biological sequences found in the various species is often possible by a phylogeny, which is more informative than homology statements. Detecting and elaborating on common origins may answer how certain biological sequences developed, and predict what sequences are in a particular species and what their function is. Such knowledge transfer from sequences in one species to the homologous sequences of the other is based on the principle of ‘my closest relative looks and behaves like I do’, often referred to as ‘guilt by association’. To enable knowledge transfer on a large scale, several automated ‘phylogenomics pipelines’ have been developed in recent years, and seven of these will be described and compared. Overall, the examples in this review demonstrate that homology and phylogeny analyses, done on a large (and automated) scale, can give insights into function in biology and biomedicine.

  17. Genome Sequence of Microbulbifer mangrovi DD-13T Reveals Its Versatility to Degrade Multiple Polysaccharides.

    PubMed

    Imran, Md; Pant, Poonam; Shanbhag, Yogini P; Sawant, Samir V; Ghadi, Sanjeev C

    2017-02-01

    Microbulbifer mangrovi strain DD-13 T is a novel-type species isolated from the mangroves of Goa, India. The draft genome sequence of strain DD-13 comprised 4,528,106 bp with G+C content of 57.15%. Out of 3479 open reading frames, functions for 3488 protein coding sequences were predicted on the basis of similarity with the cluster of orthologous groups. In addition to protein coding sequences, 34 tRNA genes and 3 rRNA genes were detected. Analysis of nucleotide sequence of predicted gene using a Carbohydrate-Active Enzymes (CAZymes) Analysis Toolkit indicates that strain DD-13 encodes a large set of CAZymes including 255 glycoside hydrolases, 76 carbohydrate esterases, 17 polysaccharide lyases, and 113 carbohydrate-binding modules (CBMs). Many genes from strain DD-13 were annotated as carbohydrases specific for degradation of agar, alginate, carrageenan, chitin, xylan, pullulan, cellulose, starch, β-glucan, pectin, etc. Some of polysaccharide-degrading genes were highly modular and were appended at least with one CBM indicating the versatility of strain DD-13 to degrade complex polysaccharides. The cell growth of strain DD-13 was validated using pure polysaccharides such as agarose or alginate as carbon source as well as by using red and brown seaweed powder as substrate. The homologous carbohydrase produced by strain DD-13 during growth degraded the polysaccharide, ensuring the production of metabolizable reducing sugars. Additionally, several other polysaccharides such as carrageenan, xylan, pullulan, pectin, starch, and carboxymethyl cellulose were also corroborated as growth substrate for strain DD-13 and were associated with concomitant production of homologous carbohydrase.

  18. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kennedy, M.A.; Morris, C.M.; Fitzgerald, P.H.

    The human kappa deleting element (Kde) mediates loss of CK and JK genes in B cells. A probe for Kde detects two genomic sequences on Southern blots. The Kde is located 24kb 3{prime} to CK, but the position of the homologous sequence is unknown. The authors in situ hybridized m141-2 to metaphase cells of JC11, a B-cell line bearing a t(2;14)(p11;q32) in which the chromosome 2 breakpoint is within JK or the VK-JK intron. Three peaks of labelled sites were obtained. Southern analysis of BamH1 digested DNA showed that Kde (14kb) and the homologous sequence (3kb) were both intact. Kdemore » accounts for hybridization to 14q+ and the 2p- signal presumably derives from the related sequence. This locates the sequence homologous to Kde upstream from JK, possibly within the VK cluster, and may reflect transposition or some other duplicative event as proposed for the evolution of other regions of the kappa locus.« less

  19. Identification of genes from pattern formation, tyrosine kinase, and potassium channel families by DNA amplification

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kamb, A.; Weir, M.; Rudy, B.

    1989-06-01

    The study of gene family members has been aided by the isolation of related genes on the basis of DNA homology. The authors have adapted the polymerase chain reaction to screen animal genomes very rapidly and reliably for likely gene family members. Using conserved amino acid sequences to design degenerate oligonucleotide primers, they have shown that the genome of the nematode Caenorhabditis elegans contains sequences homologous to many Drosophila genes involved in pattern formation, including the segment polarity gene wingless (vertebrate int-1), and homeobox sequences characteristic of the Antennapedia, engrailed, and paired families. In addition, they have used this methodmore » to show that C. elegans contains at least five different sequences homologous to genes in the tyrosine kinase family. Lastly, they have isolated six potassium channel sequences from humans, a result that validates the utility of the method with large genomes and suggests that human potassium channel gene diversity may be extensive.« less

  20. Genetic Diversity and Phylogenetic Analysis of the Iranian Leishmania Parasites Based on HSP70 Gene PCR-RFLP and Sequence Analysis.

    PubMed

    Nemati, Sara; Fazaeli, Asghar; Hajjaran, Homa; Khamesipour, Ali; Anbaran, Mohsen Falahati; Bozorgomid, Arezoo; Zarei, Fatah

    2017-08-01

    Despite the broad distribution of leishmaniasis among Iranians and animals across the country, little is known about the genetic characteristics of the causative agents. Applying both HSP70 PCR-RFLP and sequence analyses, this study aimed to evaluate the genetic diversity and phylogenetic relationships among Leishmania spp. isolated from Iranian endemic foci and available reference strains. A total of 36 Leishmania isolates from almost all districts across the country were genetically analyzed for the HSP70 gene using both PCR-RFLP and sequence analysis. The original HSP70 gene sequences were aligned along with homologous Leishmania sequences retrieved from NCBI, and subjected to the phylogenetic analysis. Basic parameters of genetic diversity were also estimated. The HSP70 PCR-RFLP presented 3 different electrophoretic patterns, with no further intraspecific variation, corresponding to 3 Leishmania species available in the country, L. tropica, L. major, and L. infantum. Phylogenetic analyses presented 5 major clades, corresponding to 5 species complexes. Iranian lineages, including L. major, L. tropica, and L. infantum, were distributed among 3 complexes L. major, L. tropica, and L. donovani. However, within the L. major and L. donovani species complexes, the HSP70 phylogeny was not able to distinguish clearly between the L. major and L. turanica isolates, and between the L. infantum, L. donovani, and L. chagasi isolates, respectively. Our results indicated that both HSP70 PCR-RFLP and sequence analyses are medically applicable tools for identification of Leishmania species in Iranian patients. However, the reduced genetic diversity of the target gene makes it inevitable that its phylogeny only resolves the major groups, namely, the species complexes.

  1. Phylo-mLogo: an interactive and hierarchical multiple-logo visualization tool for alignment of many sequences

    PubMed Central

    Shih, Arthur Chun-Chieh; Lee, DT; Peng, Chin-Lin; Wu, Yu-Wei

    2007-01-01

    Background When aligning several hundreds or thousands of sequences, such as epidemic virus sequences or homologous/orthologous sequences of some big gene families, to reconstruct the epidemiological history or their phylogenies, how to analyze and visualize the alignment results of many sequences has become a new challenge for computational biologists. Although there are several tools available for visualization of very long sequence alignments, few of them are applicable to the alignments of many sequences. Results A multiple-logo alignment visualization tool, called Phylo-mLogo, is presented in this paper. Phylo-mLogo calculates the variabilities and homogeneities of alignment sequences by base frequencies or entropies. Different from the traditional representations of sequence logos, Phylo-mLogo not only displays the global logo patterns of the whole alignment of multiple sequences, but also demonstrates their local homologous logos for each clade hierarchically. In addition, Phylo-mLogo also allows the user to focus only on the analysis of some important, structurally or functionally constrained sites in the alignment selected by the user or by built-in automatic calculation. Conclusion With Phylo-mLogo, the user can symbolically and hierarchically visualize hundreds of aligned sequences simultaneously and easily check the changes of their amino acid sites when analyzing many homologous/orthologous or influenza virus sequences. More information of Phylo-mLogo can be found at URL . PMID:17319966

  2. Single Molecule Visualization of Protein-DNA Complexes: Watching Machines at Work

    NASA Astrophysics Data System (ADS)

    Kowalczykowski, Stephen

    2013-03-01

    We can now watch individual proteins acting on single molecules of DNA. Such imaging provides unprecedented interrogation of fundamental biophysical processes. Visualization is achieved through the application of two complementary procedures. In one, single DNA molecules are attached to a polystyrene bead and are then captured by an optical trap. The DNA, a worm-like coil, is extended either by the force of solution flow in a micro-fabricated channel, or by capturing the opposite DNA end in a second optical trap. In the second procedure, DNA is attached by one end to a glass surface. The coiled DNA is elongated either by continuous solution flow or by subsequently tethering the opposite end to the surface. Protein action is visualized by fluorescent reporters: fluorescent dyes that bind double-stranded DNA (dsDNA), fluorescent biosensors for single-stranded DNA (ssDNA), or fluorescently-tagged proteins. Individual molecules are imaged using either epifluorescence microscopy or total internal reflection fluorescence (TIRF) microscopy. Using these approaches, we imaged the search for DNA sequence homology conducted by the RecA-ssDNA filament. The manner by which RecA protein finds a single homologous sequence in the genome had remained undefined for almost 30 years. Single-molecule imaging revealed that the search occurs through a mechanism termed ``intersegmental contact sampling,'' in which the randomly coiled structure of DNA is essential for reiterative sampling of DNA sequence identity: an example of parallel processing. In addition, the assembly of RecA filaments on single molecules of single-stranded DNA was visualized. Filament assembly requires nucleation of a protein dimer on DNA, and subsequent growth occurs via monomer addition. Furthermore, we discovered a class of proteins that catalyzed both nucleation and growth of filaments, revealing how the cell controls assembly of this protein-DNA complex.

  3. Analysis and functional classification of transcripts from the nematode Meloidogyne incognita

    PubMed Central

    McCarter, James P; Dautova Mitreva, Makedonka; Martin, John; Dante, Mike; Wylie, Todd; Rao, Uma; Pape, Deana; Bowers, Yvette; Theising, Brenda; Murphy, Claire V; Kloek, Andrew P; Chiapelli, Brandi J; Clifton, Sandra W; Bird, David Mck; Waterston, Robert H

    2003-01-01

    Background Plant parasitic nematodes are major pathogens of most crops. Molecular characterization of these species as well as the development of new techniques for control can benefit from genomic approaches. As an entrée to characterizing plant parasitic nematode genomes, we analyzed 5,700 expressed sequence tags (ESTs) from second-stage larvae (L2) of the root-knot nematode Meloidogyne incognita. Results From these, 1,625 EST clusters were formed and classified by function using the Gene Ontology (GO) hierarchy and the Kyoto KEGG database. L2 larvae, which represent the infective stage of the life cycle before plant invasion, express a diverse array of ligand-binding proteins and abundant cytoskeletal proteins. L2 are structurally similar to Caenorhabditis elegans dauer larva and the presence of transcripts encoding glyoxylate pathway enzymes in the M. incognita clusters suggests that root-knot nematode larvae metabolize lipid stores while in search of a host. Homology to other species was observed in 79% of translated cluster sequences, with the C. elegans genome providing more information than any other source. In addition to identifying putative nematode-specific and Tylenchida-specific genes, sequencing revealed previously uncharacterized horizontal gene transfer candidates in Meloidogyne with high identity to rhizobacterial genes including homologs of nodL acetyltransferase and novel cellulases. Conclusions With sequencing from plant parasitic nematodes accelerating, the approaches to transcript characterization described here can be applied to more extensive datasets and also provide a foundation for more complex genome analyses. PMID:12702207

  4. SH2-catalytic domain linker heterogeneity influences allosteric coupling across the SFK family.

    PubMed

    Register, A C; Leonard, Stephen E; Maly, Dustin J

    2014-11-11

    Src-family kinases (SFKs) make up a family of nine homologous multidomain tyrosine kinases whose misregulation is responsible for human disease (cancer, diabetes, inflammation, etc.). Despite overall sequence homology and identical domain architecture, differences in SH3 and SH2 regulatory domain accessibility and ability to allosterically autoinhibit the ATP-binding site have been observed for the prototypical SFKs Src and Hck. Biochemical and structural studies indicate that the SH2-catalytic domain (SH2-CD) linker, the intramolecular binding epitope for SFK SH3 domains, is responsible for allosterically coupling SH3 domain engagement to autoinhibition of the ATP-binding site through the conformation of the αC helix. As a relatively unconserved region between SFK family members, SH2-CD linker sequence variability across the SFK family is likely a source of nonredundant cellular functions between individual SFKs via its effect on the availability of SH3 and SH2 domains for intermolecular interactions and post-translational modification. Using a combination of SFKs engineered with enhanced or weakened regulatory domain intramolecular interactions and conformation-selective inhibitors that report αC helix conformation, this study explores how SH2-CD sequence heterogeneity affects allosteric coupling across the SFK family by examining Lyn, Fyn1, and Fyn2. Analyses of Fyn1 and Fyn2, isoforms that are identical but for a 50-residue sequence spanning the SH2-CD linker, demonstrate that SH2-CD linker sequence differences can have profound effects on allosteric coupling between otherwise identical kinases. Most notably, a dampened allosteric connection between the SH3 domain and αC helix leads to greater autoinhibitory phosphorylation by Csk, illustrating the complex effects of SH2-CD linker sequence on cellular function.

  5. Electrostatic Interactions between Elongated Monomers Drive Filamentation of Drosophila Shrub, a Metazoan ESCRT-III Protein.

    PubMed

    McMillan, Brian J; Tibbe, Christine; Jeon, Hyesung; Drabek, Andrew A; Klein, Thomas; Blacklow, Stephen C

    2016-08-02

    The endosomal sorting complex required for transport (ESCRT) is a conserved protein complex that facilitates budding and fission of membranes. It executes a key step in many cellular events, including cytokinesis and multi-vesicular body formation. The ESCRT-III protein Shrub in flies, or its homologs in yeast (Snf7) or humans (CHMP4B), is a critical polymerizing component of ESCRT-III needed to effect membrane fission. We report the structural basis for polymerization of Shrub and define a minimal region required for filament formation. The X-ray structure of the Shrub core shows that individual monomers in the lattice interact in a staggered arrangement using complementary electrostatic surfaces. Mutations that disrupt interface salt bridges interfere with Shrub polymerization and function. Despite substantial sequence divergence and differences in packing interactions, the arrangement of Shrub subunits in the polymer resembles that of Snf7 and other family homologs, suggesting that this intermolecular packing mechanism is shared among ESCRT-III proteins. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.

  6. Classical non-homologous end-joining pathway utilizes nascent RNA for error-free double-strand break repair of transcribed genes

    PubMed Central

    Chakraborty, Anirban; Tapryal, Nisha; Venkova, Tatiana; Horikoshi, Nobuo; Pandita, Raj K.; Sarker, Altaf H.; Sarkar, Partha S.; Pandita, Tej K.; Hazra, Tapas K.

    2016-01-01

    DNA double-strand breaks (DSBs) leading to loss of nucleotides in the transcribed region can be lethal. Classical non-homologous end-joining (C-NHEJ) is the dominant pathway for DSB repair (DSBR) in adult mammalian cells. Here we report that during such DSBR, mammalian C-NHEJ proteins form a multiprotein complex with RNA polymerase II and preferentially associate with the transcribed genes after DSB induction. Depletion of C-NHEJ factors significantly abrogates DSBR in transcribed but not in non-transcribed genes. We hypothesized that nascent RNA can serve as a template for restoring the missing sequences, thus allowing error-free DSBR. We indeed found pre-mRNA in the C-NHEJ complex. Finally, when a DSB-containing plasmid with several nucleotides deleted within the E. coli lacZ gene was allowed time to repair in lacZ-expressing mammalian cells, a functional lacZ plasmid could be recovered from control but not C-NHEJ factor-depleted cells, providing important mechanistic insights into C-NHEJ-mediated error-free DSBR of the transcribed genome. PMID:27703167

  7. Response to reactive nitrogen intermediates in Mycobacterium tuberculosis: induction of the 16-kilodalton alpha-crystallin homolog by exposure to nitric oxide donors.

    PubMed

    Garbe, T R; Hibler, N S; Deretic, V

    1999-01-01

    In contrast to the apparent paucity of Mycobacterium tuberculosis response to reactive oxygen intermediates, this organism has evolved a specific response to nitric oxide challenge. Exposure of M. tuberculosis to NO donors induces the synthesis of a set of polypeptides that have been collectively termed Nox. In this work, the most prominent Nox polypeptide, Nox16, was identified by immunoblotting and by N-terminal sequencing as the alpha-crystallin-related, 16-kDa small heat shock protein, sHsp16. A panel of chemically diverse donors of nitric oxide, with the exception of nitroprusside, induced sHsp16 (Nox16). Nitroprusside, a coordination complex of Fe2+ with a nitrosonium (NO+) ion, induced a 19-kDa polypeptide (Nox19) homologous to the nonheme bacterial ferritins. We conclude that the NO response in M. tuberculosis is dominated by increased synthesis of the alpha-crystallin homolog sHsp16, previously implicated in stationary-phase processes and found in this study to be a major M. tuberculosis protein induced upon exposure to reactive nitrogen intermediates.

  8. Oligo/Polynucleotide-Based Gene Modification: Strategies and Therapeutic Potential

    PubMed Central

    Sargent, R. Geoffrey; Kim, Soya

    2011-01-01

    Oligonucleotide- and polynucleotide-based gene modification strategies were developed as an alternative to transgene-based and classical gene targeting-based gene therapy approaches for treatment of genetic disorders. Unlike the transgene-based strategies, oligo/polynucleotide gene targeting approaches maintain gene integrity and the relationship between the protein coding and gene-specific regulatory sequences. Oligo/polynucleotide-based gene modification also has several advantages over classical vector-based homologous recombination approaches. These include essentially complete homology to the target sequence and the potential to rapidly engineer patient-specific oligo/polynucleotide gene modification reagents. Several oligo/polynucleotide-based approaches have been shown to successfully mediate sequence-specific modification of genomic DNA in mammalian cells. The strategies involve the use of polynucleotide small DNA fragments, triplex-forming oligonucleotides, and single-stranded oligodeoxynucleotides to mediate homologous exchange. The primary focus of this review will be on the mechanistic aspects of the small fragment homologous replacement, triplex-forming oligonucleotide-mediated, and single-stranded oligodeoxynucleotide-mediated gene modification strategies as it relates to their therapeutic potential. PMID:21417933

  9. Evolution of DNA Replication Protein Complexes in Eukaryotes and Archaea

    PubMed Central

    Chia, Nicholas; Cann, Isaac; Olsen, Gary J.

    2010-01-01

    Background The replication of DNA in Archaea and eukaryotes requires several ancillary complexes, including proliferating cell nuclear antigen (PCNA), replication factor C (RFC), and the minichromosome maintenance (MCM) complex. Bacterial DNA replication utilizes comparable proteins, but these are distantly related phylogenetically to their archaeal and eukaryotic counterparts at best. Methodology/Principal Findings While the structures of each of the complexes do not differ significantly between the archaeal and eukaryotic versions thereof, the evolutionary dynamic in the two cases does. The number of subunits in each complex is constant across all taxa. However, they vary subtly with regard to composition. In some taxa the subunits are all identical in sequence, while in others some are homologous rather than identical. In the case of eukaryotes, there is no phylogenetic variation in the makeup of each complex—all appear to derive from a common eukaryotic ancestor. This is not the case in Archaea, where the relationship between the subunits within each complex varies taxon-to-taxon. We have performed a detailed phylogenetic analysis of these relationships in order to better understand the gene duplications and divergences that gave rise to the homologous subunits in Archaea. Conclusion/Significance This domain level difference in evolution suggests that different forces have driven the evolution of DNA replication proteins in each of these two domains. In addition, the phylogenies of all three gene families support the distinctiveness of the proposed archaeal phylum Thaumarchaeota. PMID:20532250

  10. A Sand Fly Salivary Protein Vaccine Shows Efficacy Against Vector-Transmitted Cutaneous Leishmaniasis in Nonhuman Primates

    DTIC Science & Technology

    2015-06-03

    demonstrating its immunogenicity in humans. PdSP15 sequence and structure show no homol- ogy to mammalian proteins, further demonstrating its potential...sequence or structure homology to known human proteins The protective salivary antigen PdSP15 shares sequence homology only to the small odorant binding...salivary proteins PpSP15 and PsSP15, respectively (Fig. 4B). To exclude any structural similarities to human pro teins, the crystal structure of PdPS15

  11. Finding similar nucleotide sequences using network BLAST searches.

    PubMed

    Ladunga, Istvan

    2009-06-01

    The Basic Local Alignment Search Tool (BLAST) is a keystone of bioinformatics due to its performance and user-friendliness. Beginner and intermediate users will learn how to design and submit blastn and Megablast searches on the Web pages at the National Center for Biotechnology Information. We map nucleic acid sequences to genomes, find identical or similar mRNA, expressed sequence tag, and noncoding RNA sequences, and run Megablast searches, which are much faster than blastn. Understanding results is assisted by taxonomy reports, genomic views, and multiple alignments. We interpret expected frequency thresholds, biological significance, and statistical significance. Weak hits provide no evidence, but hints for further analyses. We find genes that may code for homologous proteins by translated BLAST. We reduce false positives by filtering out low-complexity regions. Parsed BLAST results can be integrated into analysis pipelines. Links in the output connect to Entrez, PUBMED, structural, sequence, interaction, and expression databases. This facilitates integration with a wide spectrum of biological knowledge.

  12. Unusual genome complexity in Lactobacillus salivarius JCM1046.

    PubMed

    Raftis, Emma J; Forde, Brian M; Claesson, Marcus J; O'Toole, Paul W

    2014-09-08

    Lactobacillus salivarius strains are increasingly being exploited for their probiotic properties in humans and animals. Dissemination of antibiotic resistance genes among species with food or probiotic-association is undesirable and is often mediated by plasmids or integrative and conjugative elements. L. salivarius strains typically have multireplicon genomes including circular megaplasmids that encode strain-specific traits for intestinal survival and probiotic activity. Linear plasmids are less common in lactobacilli and show a very limited distribution in L. salivarius. Here we present experimental evidence that supports an unusually complex multireplicon genome structure in the porcine isolate L. salivarius JCM1046. JCM1046 harbours a 1.83 Mb chromosome, and four plasmids which constitute 20% of the genome. In addition to the known 219 kb repA-type megaplasmid pMP1046A, we identified and experimentally validated the topology of three additional replicons, the circular pMP1046B (129 kb), a linear plasmid pLMP1046 (101 kb) and pCTN1046 (33 kb) harbouring a conjugative transposon. pMP1046B harbours both plasmid-associated replication genes and paralogues of chromosomally encoded housekeeping and information-processing related genes, thus qualifying it as a putative chromid. pLMP1046 shares limited sequence homology or gene synteny with other L. salivarius plasmids, and its putative replication-associated protein is homologous to the RepA/E proteins found in the large circular megaplasmids of L. salivarius. Plasmid pCTN1046 harbours a single copy of an integrated conjugative transposon (Tn6224) which appears to be functionally intact and includes the tetracycline resistance gene tetM. Experimental validation of sequence assemblies and plasmid topology resolved the complex genome architecture of L. salivarius JCM1046. A high-coverage draft genome sequence would not have elucidated the genome complexity in this strain. Given the expanding use of L. salivarius as a probiotic, it is important to determine the genotypic and phenotypic organization of L. salivarius strains. The identification of Tn6224-like elements in this species has implications for strain selection for probiotic applications.

  13. Membrane and Protein Interactions of the Pleckstrin Homology Domain Superfamily

    PubMed Central

    Lenoir, Marc; Kufareva, Irina; Abagyan, Ruben; Overduin, Michael

    2015-01-01

    The human genome encodes about 285 proteins that contain at least one annotated pleckstrin homology (PH) domain. As the first phosphoinositide binding module domain to be discovered, the PH domain recruits diverse protein architectures to cellular membranes. PH domains constitute one of the largest protein superfamilies, and have diverged to regulate many different signaling proteins and modules such as Dbl homology (DH) and Tec homology (TH) domains. The ligands of approximately 70 PH domains have been validated by binding assays and complexed structures, allowing meaningful extrapolation across the entire superfamily. Here the Membrane Optimal Docking Area (MODA) program is used at a genome-wide level to identify all membrane docking PH structures and map their lipid-binding determinants. In addition to the linear sequence motifs which are employed for phosphoinositide recognition, the three dimensional structural features that allow peripheral membrane domains to approach and insert into the bilayer are pinpointed and can be predicted ab initio. The analysis shows that conserved structural surfaces distinguish which PH domains associate with membrane from those that do not. Moreover, the results indicate that lipid-binding PH domains can be classified into different functional subgroups based on the type of membrane insertion elements they project towards the bilayer. PMID:26512702

  14. RNA motif search with data-driven element ordering.

    PubMed

    Rampášek, Ladislav; Jimenez, Randi M; Lupták, Andrej; Vinař, Tomáš; Brejová, Broňa

    2016-05-18

    In this paper, we study the problem of RNA motif search in long genomic sequences. This approach uses a combination of sequence and structure constraints to uncover new distant homologs of known functional RNAs. The problem is NP-hard and is traditionally solved by backtracking algorithms. We have designed a new algorithm for RNA motif search and implemented a new motif search tool RNArobo. The tool enhances the RNAbob descriptor language, allowing insertions in helices, which enables better characterization of ribozymes and aptamers. A typical RNA motif consists of multiple elements and the running time of the algorithm is highly dependent on their ordering. By approaching the element ordering problem in a principled way, we demonstrate more than 100-fold speedup of the search for complex motifs compared to previously published tools. We have developed a new method for RNA motif search that allows for a significant speedup of the search of complex motifs that include pseudoknots. Such speed improvements are crucial at a time when the rate of DNA sequencing outpaces growth in computing. RNArobo is available at http://compbio.fmph.uniba.sk/rnarobo .

  15. Exploration of new perspectives and limitations in Agrobacterium-mediated gene transfer technology. Final report, June 1, 1992--May 31, 1995

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Marton, L.

    1996-02-01

    Genetic manipulation of plants often involves the introduction of homologous or partly homologous genes. Ectropic introduction of homologous sequences into plant genomes may trigger epigenetic changes, making expression of the genes unpredictable. The main project objective was to examine the feasibility of using Agrobacterium-mediated gene transfer for homologous gene targeting in plants.

  16. Super-resolution biomolecular crystallography with low-resolution data.

    PubMed

    Schröder, Gunnar F; Levitt, Michael; Brunger, Axel T

    2010-04-22

    X-ray diffraction plays a pivotal role in the understanding of biological systems by revealing atomic structures of proteins, nucleic acids and their complexes, with much recent interest in very large assemblies like the ribosome. As crystals of such large assemblies often diffract weakly (resolution worse than 4 A), we need methods that work at such low resolution. In macromolecular assemblies, some of the components may be known at high resolution, whereas others are unknown: current refinement methods fail as they require a high-resolution starting structure for the entire complex. Determining the structure of such complexes, which are often of key biological importance, should be possible in principle as the number of independent diffraction intensities at a resolution better than 5 A generally exceeds the number of degrees of freedom. Here we introduce a method that adds specific information from known homologous structures but allows global and local deformations of these homology models. Our approach uses the observation that local protein structure tends to be conserved as sequence and function evolve. Cross-validation with R(free) (the free R-factor) determines the optimum deformation and influence of the homology model. For test cases at 3.5-5 A resolution with known structures at high resolution, our method gives significant improvements over conventional refinement in the model as monitored by coordinate accuracy, the definition of secondary structure and the quality of electron density maps. For re-refinements of a representative set of 19 low-resolution crystal structures from the Protein Data Bank, we find similar improvements. Thus, a structure derived from low-resolution diffraction data can have quality similar to a high-resolution structure. Our method is applicable to the study of weakly diffracting crystals using X-ray micro-diffraction as well as data from new X-ray light sources. Use of homology information is not restricted to X-ray crystallography and cryo-electron microscopy: as optical imaging advances to subnanometre resolution, it can use similar tools.

  17. EUGENE'HOM: A generic similarity-based gene finder using multiple homologous sequences.

    PubMed

    Foissac, Sylvain; Bardou, Philippe; Moisan, Annick; Cros, Marie-Josée; Schiex, Thomas

    2003-07-01

    EUGENE'HOM is a gene prediction software for eukaryotic organisms based on comparative analysis. EUGENE'HOM is able to take into account multiple homologous sequences from more or less closely related organisms. It integrates the results of TBLASTX analysis, splice site and start codon prediction and a robust coding/non-coding probabilistic model which allows EUGENE'HOM to handle sequences from a variety of organisms. The current target of EUGENE'HOM is plant sequences. The EUGENE'HOM web site is available at http://genopole.toulouse.inra.fr/bioinfo/eugene/EuGeneHom/cgi-bin/EuGeneHom.pl.

  18. Editing Transgenic DNA Components by Inducible Gene Replacement in Drosophila melanogaster

    PubMed Central

    Lin, Chun-Chieh; Potter, Christopher J.

    2016-01-01

    Gene conversions occur when genomic double-strand DNA breaks (DSBs) trigger unidirectional transfer of genetic material from a homologous template sequence. Exogenous or mutated sequence can be introduced through this homology-directed repair (HDR). We leveraged gene conversion to develop a method for genomic editing of existing transgenic insertions in Drosophila melanogaster. The clustered regularly-interspaced palindromic repeats (CRISPR)/Cas9 system is used in the homology assisted CRISPR knock-in (HACK) method to induce DSBs in a GAL4 transgene, which is repaired by a single-genomic transgenic construct containing GAL4 homologous sequences flanking a T2A-QF2 cassette. With two crosses, this technique converts existing GAL4 lines, including enhancer traps, into functional QF2 expressing lines. We used HACK to convert the most commonly-used GAL4 lines (labeling tissues such as neurons, fat, glia, muscle, and hemocytes) to QF2 lines. We also identified regions of the genome that exhibited differential efficiencies of HDR. The HACK technique is robust and readily adaptable for targeting and replacement of other genomic sequences, and could be a useful approach to repurpose existing transgenes as new genetic reagents become available. PMID:27334272

  19. Compartmentalization of the yeast meiotic nucleus revealed by analysis of ectopic recombination.

    PubMed

    Schlecht, Hélène B; Lichten, Michael; Goldman, Alastair S H

    2004-11-01

    As yeast cells enter meiosis, chromosomes move from a centromere-clustered (Rabl) to a telomere-clustered (bouquet) configuration and then to states of progressive homolog pairing where telomeres are more dispersed. It is uncertain at which stage of this process sequences commit to recombine with each other. Previous analyses using recombination between dispersed homologous sequences (ectopic recombination) support the view that, on average, homologs are aligned end to end by the time of commitment to recombination. We have undertaken further analyses incorporating new inserts, chromosome rearrangements, an alternate mode of recombination initiation, and mutants that disrupt nuclear structure or telomere metabolism. Our findings support previous conclusions and reveal that distance from the nearest telomere is an important parameter influencing recombination between dispersed sequences. In general, the farther dispersed sequences are from their nearest telomere, the less likely they are to engage in ectopic recombination. Neither the mode of initiating recombination nor the formation of the bouquet appears to affect this relationship. We suggest that aspects of telomere localization and behavior influence the organization and mobility of chromosomes along their entire length, during a critical period of meiosis I prophase that encompasses the homology search.

  20. Genome Wide Re-Annotation of Caldicellulosiruptor saccharolyticus with New Insights into Genes Involved in Biomass Degradation and Hydrogen Production

    PubMed Central

    Chowdhary, Nupoor; Selvaraj, Ashok; KrishnaKumaar, Lakshmi; Kumar, Gopal Ramesh

    2015-01-01

    Caldicellulosiruptor saccharolyticus has proven itself to be an excellent candidate for biological hydrogen (H2) production, but still it has major drawbacks like sensitivity to high osmotic pressure and low volumetric H2 productivity, which should be considered before it can be used industrially. A whole genome re-annotation work has been carried out as an attempt to update the incomplete genome information that causes gap in the knowledge especially in the area of metabolic engineering, to improve the H2 producing capabilities of C. saccharolyticus. Whole genome re-annotation was performed through manual means for 2,682 Coding Sequences (CDSs). Bioinformatics tools based on sequence similarity, motif search, phylogenetic analysis and fold recognition were employed for re-annotation. Our methodology could successfully add functions for 409 hypothetical proteins (HPs), 46 proteins previously annotated as putative and assigned more accurate functions for the known protein sequences. Homology based gene annotation has been used as a standard method for assigning function to novel proteins, but over the past few years many non-homology based methods such as genomic context approaches for protein function prediction have been developed. Using non-homology based functional prediction methods, we were able to assign cellular processes or physical complexes for 249 hypothetical sequences. Our re-annotation pipeline highlights the addition of 231 new CDSs generated from MicroScope Platform, to the original genome with functional prediction for 49 of them. The re-annotation of HPs and new CDSs is stored in the relational database that is available on the MicroScope web-based platform. In parallel, a comparative genome analyses were performed among the members of genus Caldicellulosiruptor to understand the function and evolutionary processes. Further, with results from integrated re-annotation studies (homology and genomic context approach), we strongly suggest that Csac_0437 and Csac_0424 encode for glycoside hydrolases (GH) and are proposed to be involved in the decomposition of recalcitrant plant polysaccharides. Similarly, HPs: Csac_0732, Csac_1862, Csac_1294 and Csac_0668 are suggested to play a significant role in biohydrogen production. Function prediction of these HPs by using our integrated approach will considerably enhance the interpretation of large-scale experiments targeting this industrially important organism. PMID:26196387

  1. Genome Wide Re-Annotation of Caldicellulosiruptor saccharolyticus with New Insights into Genes Involved in Biomass Degradation and Hydrogen Production.

    PubMed

    Chowdhary, Nupoor; Selvaraj, Ashok; KrishnaKumaar, Lakshmi; Kumar, Gopal Ramesh

    2015-01-01

    Caldicellulosiruptor saccharolyticus has proven itself to be an excellent candidate for biological hydrogen (H2) production, but still it has major drawbacks like sensitivity to high osmotic pressure and low volumetric H2 productivity, which should be considered before it can be used industrially. A whole genome re-annotation work has been carried out as an attempt to update the incomplete genome information that causes gap in the knowledge especially in the area of metabolic engineering, to improve the H2 producing capabilities of C. saccharolyticus. Whole genome re-annotation was performed through manual means for 2,682 Coding Sequences (CDSs). Bioinformatics tools based on sequence similarity, motif search, phylogenetic analysis and fold recognition were employed for re-annotation. Our methodology could successfully add functions for 409 hypothetical proteins (HPs), 46 proteins previously annotated as putative and assigned more accurate functions for the known protein sequences. Homology based gene annotation has been used as a standard method for assigning function to novel proteins, but over the past few years many non-homology based methods such as genomic context approaches for protein function prediction have been developed. Using non-homology based functional prediction methods, we were able to assign cellular processes or physical complexes for 249 hypothetical sequences. Our re-annotation pipeline highlights the addition of 231 new CDSs generated from MicroScope Platform, to the original genome with functional prediction for 49 of them. The re-annotation of HPs and new CDSs is stored in the relational database that is available on the MicroScope web-based platform. In parallel, a comparative genome analyses were performed among the members of genus Caldicellulosiruptor to understand the function and evolutionary processes. Further, with results from integrated re-annotation studies (homology and genomic context approach), we strongly suggest that Csac_0437 and Csac_0424 encode for glycoside hydrolases (GH) and are proposed to be involved in the decomposition of recalcitrant plant polysaccharides. Similarly, HPs: Csac_0732, Csac_1862, Csac_1294 and Csac_0668 are suggested to play a significant role in biohydrogen production. Function prediction of these HPs by using our integrated approach will considerably enhance the interpretation of large-scale experiments targeting this industrially important organism.

  2. Cytokine-like factor-1, a novel soluble protein, shares homology with members of the cytokine type I receptor family.

    PubMed

    Elson, G C; Graber, P; Losberger, C; Herren, S; Gretener, D; Menoud, L N; Wells, T N; Kosco-Vilbois, M H; Gauchat, J F

    1998-08-01

    In this report we describe the identification, cloning, and expression pattern of human cytokine-like factor 1 (hCLF-1) and the identification and cloning of its murine homologue. They were identified from expressed sequence tags using amino acid sequences from conserved regions of the cytokine type I receptor family. Human CLF-1 and murine CLF-1 shared 96% amino acid identity and significant homology with many cytokine type I receptors. CLF-1 is a secreted protein, suggesting that it is either a soluble subunit within a cytokine receptor complex, like the soluble form of the IL-6R alpha-chain, or a subunit of a multimeric cytokine, e.g., IL-12 p40. The highest levels of hCLF-1 mRNA were observed in lymph node, spleen, thymus, appendix, placenta, stomach, bone marrow, and fetal lung, with constitutive expression of CLF-1 mRNA detected in a human kidney fibroblastic cell line. In fibroblast primary cell cultures, CLF-1 mRNA was up-regulated by TNF-alpha, IL-6, and IFN-gamma. Western blot analysis of recombinant forms of hCLF-1 showed that the protein has the tendency to form covalently linked di- and tetramers. These results suggest that CLF-1 is a novel soluble cytokine receptor subunit or part of a novel cytokine complex, possibly playing a regulatory role in the immune system and during fetal development.

  3. A Comprehensive Strategy for Accurate Mutation Detection of the Highly Homologous PMS2.

    PubMed

    Li, Jianli; Dai, Hongzheng; Feng, Yanming; Tang, Jia; Chen, Stella; Tian, Xia; Gorman, Elizabeth; Schmitt, Eric S; Hansen, Terah A A; Wang, Jing; Plon, Sharon E; Zhang, Victor Wei; Wong, Lee-Jun C

    2015-09-01

    Germline mutations in the DNA mismatch repair gene PMS2 underlie the cancer susceptibility syndrome, Lynch syndrome. However, accurate molecular testing of PMS2 is complicated by a large number of highly homologous sequences. To establish a comprehensive approach for mutation detection of PMS2, we have designed a strategy combining targeted capture next-generation sequencing (NGS), multiplex ligation-dependent probe amplification, and long-range PCR followed by NGS to simultaneously detect point mutations and copy number changes of PMS2. Exonic deletions (E2 to E9, E5 to E9, E8, E10, E14, and E1 to E15), duplications (E11 to E12), and a nonsense mutation, p.S22*, were identified. Traditional multiplex ligation-dependent probe amplification and Sanger sequencing approaches cannot differentiate the origin of the exonic deletions in the 3' region when PMS2 and PMS2CL share identical sequences as a result of gene conversion. Our approach allows unambiguous identification of mutations in the active gene with a straightforward long-range-PCR/NGS method. Breakpoint analysis of multiple samples revealed that recurrent exon 14 deletions are mediated by homologous Alu sequences. Our comprehensive approach provides a reliable tool for accurate molecular analysis of genes containing multiple copies of highly homologous sequences and should improve PMS2 molecular analysis for patients with Lynch syndrome. Copyright © 2015 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.

  4. Inference of Functionally-Relevant N-acetyltransferase Residues Based on Statistical Correlations.

    PubMed

    Neuwald, Andrew F; Altschul, Stephen F

    2016-12-01

    Over evolutionary time, members of a superfamily of homologous proteins sharing a common structural core diverge into subgroups filling various functional niches. At the sequence level, such divergence appears as correlations that arise from residue patterns distinct to each subgroup. Such a superfamily may be viewed as a population of sequences corresponding to a complex, high-dimensional probability distribution. Here we model this distribution as hierarchical interrelated hidden Markov models (hiHMMs), which describe these sequence correlations implicitly. By characterizing such correlations one may hope to obtain information regarding functionally-relevant properties that have thus far evaded detection. To do so, we infer a hiHMM distribution from sequence data using Bayes' theorem and Markov chain Monte Carlo (MCMC) sampling, which is widely recognized as the most effective approach for characterizing a complex, high dimensional distribution. Other routines then map correlated residue patterns to available structures with a view to hypothesis generation. When applied to N-acetyltransferases, this reveals sequence and structural features indicative of functionally important, yet generally unknown biochemical properties. Even for sets of proteins for which nothing is known beyond unannotated sequences and structures, this can lead to helpful insights. We describe, for example, a putative coenzyme-A-induced-fit substrate binding mechanism mediated by arginine residue switching between salt bridge and π-π stacking interactions. A suite of programs implementing this approach is available (psed.igs.umaryland.edu).

  5. Homologous genetic recombination in the yellow head complex of nidoviruses infecting Penaeus monodon shrimp.

    PubMed

    Wijegoonawardane, Priyanjalie K M; Sittidilokratna, Nusra; Petchampai, Natthida; Cowley, Jeff A; Gudkovs, Nicholas; Walker, Peter J

    2009-07-20

    Yellow head virus (YHV) is a highly virulent pathogen of Penaeus monodon shrimp. It is one of six known genotypes in the yellow head complex of nidoviruses which also includes mildly pathogenic gill-associated virus (GAV, genotype 2) and four other genotypes (genotypes 3-6) that have been detected only in healthy shrimp. In this study, comparative phylogenetic analyses conducted on replicase- (ORF1b) and glycoprotein- (ORF3) gene amplicons identified 10 putative natural recombinants amongst 28 viruses representing all six genotypes from across the Indo-Pacific region. The approximately 4.6 kb genomic region spanning the two amplicons was sequenced for three putative recombinant viruses from Vietnam (genotype 3/5), the Philippines (genotype 5/2) and Indonesia (genotype 3/2). SimPlot analysis using these and representative parental virus sequences confirmed that each was a recombinant genotype and identified a recombination hotspot in a region just upstream of the ORF1b C-terminus. Maximum-likelihood breakpoint analysis predicted identical crossover positions in the Vietnamese and Indonesian recombinants, and a crossover position 12 nt upstream in the Philippine recombinant. Homologous genetic recombination in the same genome region was also demonstrated in recombinants generated experimentally in shrimp co-infected with YHV and GAV. The high frequency with which natural recombinants were identified indicates that genetic exchange amongst genotypes is occurring commonly in Asia and playing a significant role in expanding the genetic diversity in the yellow head complex. This is the first evidence of genetic recombination in viruses infecting crustaceans and has significant implications for the pathogenesis of infection and diagnosis of these newly emerging invertebrate pathogens.

  6. A novel PTCH1 mutation underlies non-syndromic cleft lip and/or palate in a Han Chinese family.

    PubMed

    Zhao, Huaxiang; Zhong, Wenjie; Leng, Chuntao; Zhang, Jieni; Zhang, Mengqi; Huang, Wenbin; Zhang, Yunfan; Li, Weiran; Jia, Peizeng; Lin, Jiuxiang; Maimaitili, Gulibaha; Chen, Feng

    2018-06-16

    Cleft lip and/or palate (CL/P) is the most common craniofacial congenital disease, and it has a complex aetiology. This study aimed to identify the causative gene mutation of a Han Chinese family with CL/P. Whole exome sequencing was conducted on the proband and her mother, who exhibited the same phenotype. A Mendelian dominant inheritance model, allele frequency, mutation regions, functional prediction and literature review were used to screen and filter the variants. The candidate was validated by Sanger sequencing. Conservation analysis and homology modelling were conducted. A heterozygous missense mutation c.1175C>T in the PTCH1 gene predicting p.Ala392Val was identified. This variant has not been reported and was predicted to be deleterious. Sanger sequencing verified the variant and the dominant inheritance model in the family. The missense alteration affects an amino acid that is evolutionarily conserved in the first extracellular loop of the PTCH1 protein. The local structure of the mutant protein was significantly altered according to homology modelling. Our findings suggest that c.1175C>T in PTCH1 (NM_000264) may be the causative mutation of this pedigree. Our results add to the evidence that PTCH1 variants play a role in the pathogenesis of orofacial clefts. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  7. CCTop: An Intuitive, Flexible and Reliable CRISPR/Cas9 Target Prediction Tool

    PubMed Central

    del Sol Keyer, Maria; Wittbrodt, Joachim; Mateo, Juan L.

    2015-01-01

    Engineering of the CRISPR/Cas9 system has opened a plethora of new opportunities for site-directed mutagenesis and targeted genome modification. Fundamental to this is a stretch of twenty nucleotides at the 5’ end of a guide RNA that provides specificity to the bound Cas9 endonuclease. Since a sequence of twenty nucleotides can occur multiple times in a given genome and some mismatches seem to be accepted by the CRISPR/Cas9 complex, an efficient and reliable in silico selection and evaluation of the targeting site is key prerequisite for the experimental success. Here we present the CRISPR/Cas9 target online predictor (CCTop, http://crispr.cos.uni-heidelberg.de) to overcome limitations of already available tools. CCTop provides an intuitive user interface with reasonable default parameters that can easily be tuned by the user. From a given query sequence, CCTop identifies and ranks all candidate sgRNA target sites according to their off-target quality and displays full documentation. CCTop was experimentally validated for gene inactivation, non-homologous end-joining as well as homology directed repair. Thus, CCTop provides the bench biologist with a tool for the rapid and efficient identification of high quality target sites. PMID:25909470

  8. New approaches to high-throughput structure characterization of SH3 complexes: the example of Myosin-3 and Myosin-5 SH3 domains from S. cerevisiae.

    PubMed

    Musi, Valeria; Birdsall, Berry; Fernandez-Ballester, Gregorio; Guerrini, Remo; Salvatori, Severo; Serrano, Luis; Pastore, Annalisa

    2006-04-01

    SH3 domains are small protein modules that are involved in protein-protein interactions in several essential metabolic pathways. The availability of the complete genome and the limited number of clearly identifiable SH3 domains make the yeast Saccharomyces cerevisae an ideal proteomic-based model system to investigate the structural rules dictating the SH3-mediated protein interactions and to develop new tools to assist these studies. In the present work, we have determined the solution structure of the SH3 domain from Myo3 and modeled by homology that of the highly homologous Myo5, two myosins implicated in actin polymerization. We have then implemented an integrated approach that makes use of experimental and computational methods to characterize their binding properties. While accommodating their targets in the classical groove, the two domains have selectivity in both orientation and sequence specificity of the target peptides. From our study, we propose a consensus sequence that may provide a useful guideline to identify new natural partners and suggest a strategy of more general applicability that may be of use in other structural proteomic studies.

  9. Structural insights into the anti-HIV activity of the Oscillatoria agardhii agglutinin homolog lectin family.

    PubMed

    Koharudin, Leonardus M I; Kollipara, Sireesha; Aiken, Christopher; Gronenborn, Angela M

    2012-09-28

    Oscillatoria agardhii agglutinin homolog (OAAH) proteins belong to a recently discovered lectin family. All members contain a sequence repeat of ~66 amino acids, with the number of repeats varying among different family members. Apart from data for the founding member OAA, neither three-dimensional structures, information about carbohydrate binding specificities, nor antiviral activity data have been available up to now for any other members of the OAAH family. To elucidate the structural basis for the antiviral mechanism of OAAHs, we determined the crystal structures of Pseudomonas fluorescens and Myxococcus xanthus lectins. Both proteins exhibit the same fold, resembling the founding family member, OAA, with minor differences in loop conformations. Carbohydrate binding studies by NMR and x-ray structures of glycan-lectin complexes reveal that the number of sugar binding sites corresponds to the number of sequence repeats in each protein. As for OAA, tight and specific binding to α3,α6-mannopentaose was observed. All the OAAH proteins described here exhibit potent anti-HIV activity at comparable levels. Altogether, our results provide structural details of the protein-carbohydrate interaction for this novel lectin family and insights into the molecular basis of their HIV inactivation properties.

  10. Developing a de novo targeted knock-in method based on in utero electroporation into the mammalian brain.

    PubMed

    Tsunekawa, Yuji; Terhune, Raymond Kunikane; Fujita, Ikumi; Shitamukai, Atsunori; Suetsugu, Taeko; Matsuzaki, Fumio

    2016-09-01

    Genome-editing technology has revolutionized the field of biology. Here, we report a novel de novo gene-targeting method mediated by in utero electroporation into the developing mammalian brain. Electroporation of donor DNA with the CRISPR/Cas9 system vectors successfully leads to knock-in of the donor sequence, such as EGFP, to the target site via the homology-directed repair mechanism. We developed a targeting vector system optimized to prevent anomalous leaky expression of the donor gene from the plasmid, which otherwise often occurs depending on the donor sequence. The knock-in efficiency of the electroporated progenitors reached up to 40% in the early stage and 20% in the late stage of the developing mouse brain. Furthermore, we inserted different fluorescent markers into the target gene in each homologous chromosome, successfully distinguishing homozygous knock-in cells by color. We also applied this de novo gene targeting to the ferret model for the study of complex mammalian brains. Our results demonstrate that this technique is widely applicable for monitoring gene expression, visualizing protein localization, lineage analysis and gene knockout, all at the single-cell level, in developmental tissues. © 2016. Published by The Company of Biologists Ltd.

  11. Differences in the phenotypic effects of mutations in homologous MrpA and MrpD subunits of the multi-subunit Mrp-type Na+/H+ antiporter.

    PubMed

    Morino, Masato; Ogoda, Shinichiro; Krulwich, Terry Ann; Ito, Masahiro

    2017-01-01

    Mrp antiporters are the sole antiporters in the Cation/Proton Antiporter 3 family of transporter databases because of their unusual structural complexity, 6-7 hydrophobic proteins that function as a hetero-oligomeric complex. The two largest and homologous subunits, MrpA and MrpD, are essential for antiport activity and have direct roles in ion transport. They also show striking homology with proton-conducting, membrane-embedded Nuo subunits of respiratory chain complex I of bacteria, e.g., Escherichia coli. MrpA has the closest homology to the complex I NuoL subunit and MrpD has the closest homology to the complex I NuoM and N subunits. Here, introduction of mutations in MrpD, in residues that are also present in MrpA, led to defects in antiport function and/or complex formation. No significant phenotypes were detected in strains with mutations in corresponding residues of MrpA, but site-directed changes in the C-terminal region of MrpA had profound effects, showing that the MrpA C-terminal region has indispensable roles in antiport function. The results are consistent with a divergence in adaptations that support the roles of MrpA and MrpD in secondary antiport, as compared to later adaptations supporting homologs in primary proton pumping by the respiratory chain complex I.

  12. Domain architecture conservation in orthologs

    PubMed Central

    2011-01-01

    Background As orthologous proteins are expected to retain function more often than other homologs, they are often used for functional annotation transfer between species. However, ortholog identification methods do not take into account changes in domain architecture, which are likely to modify a protein's function. By domain architecture we refer to the sequential arrangement of domains along a protein sequence. To assess the level of domain architecture conservation among orthologs, we carried out a large-scale study of such events between human and 40 other species spanning the entire evolutionary range. We designed a score to measure domain architecture similarity and used it to analyze differences in domain architecture conservation between orthologs and paralogs relative to the conservation of primary sequence. We also statistically characterized the extents of different types of domain swapping events across pairs of orthologs and paralogs. Results The analysis shows that orthologs exhibit greater domain architecture conservation than paralogous homologs, even when differences in average sequence divergence are compensated for, for homologs that have diverged beyond a certain threshold. We interpret this as an indication of a stronger selective pressure on orthologs than paralogs to retain the domain architecture required for the proteins to perform a specific function. In general, orthologs as well as the closest paralogous homologs have very similar domain architectures, even at large evolutionary separation. The most common domain architecture changes observed in both ortholog and paralog pairs involved insertion/deletion of new domains, while domain shuffling and segment duplication/deletion were very infrequent. Conclusions On the whole, our results support the hypothesis that function conservation between orthologs demands higher domain architecture conservation than other types of homologs, relative to primary sequence conservation. This supports the notion that orthologs are functionally more similar than other types of homologs at the same evolutionary distance. PMID:21819573

  13. Highly divergent ancient gene families in metagenomic samples are compatible with additional divisions of life.

    PubMed

    Lopez, Philippe; Halary, Sébastien; Bapteste, Eric

    2015-10-26

    Microbial genetic diversity is often investigated via the comparison of relatively similar 16S molecules through multiple alignments between reference sequences and novel environmental samples using phylogenetic trees, direct BLAST matches, or phylotypes counts. However, are we missing novel lineages in the microbial dark universe by relying on standard phylogenetic and BLAST methods? If so, how can we probe that universe using alternative approaches? We performed a novel type of multi-marker analysis of genetic diversity exploiting the topology of inclusive sequence similarity networks. Our protocol identified 86 ancient gene families, well distributed and rarely transferred across the 3 domains of life, and retrieved their environmental homologs among 10 million predicted ORFs from human gut samples and other metagenomic projects. Numerous highly divergent environmental homologs were observed in gut samples, although the most divergent genes were over-represented in non-gut environments. In our networks, most divergent environmental genes grouped exclusively with uncultured relatives, in maximal cliques. Sequences within these groups were under strong purifying selection and presented a range of genetic variation comparable to that of a prokaryotic domain. Many genes families included environmental homologs that were highly divergent from cultured homologs: in 79 gene families (including 18 ribosomal proteins), Bacteria and Archaea were less divergent than some groups of environmental sequences were to any cultured or viral homologs. Moreover, some groups of environmental homologs branched very deeply in phylogenetic trees of life, when they were not too divergent to be aligned. These results underline how limited our understanding of the most diverse elements of the microbial world remains, and encourage a deeper exploration of natural communities and their genetic resources, hinting at the possibility that still unknown yet major divisions of life have yet to be discovered.

  14. Structure of T7 RNA polymerase complexed to the transcriptional inhibitor T7 lysozyme.

    PubMed Central

    Jeruzalmi, D; Steitz, T A

    1998-01-01

    The T7 RNA polymerase-T7 lysozyme complex regulates phage gene expression during infection of Escherichia coli. The 2.8 A crystal structure of the complex reveals that lysozyme binds at a site remote from the polymerase active site, suggesting an indirect mechanism of inhibition. Comparison of the T7 RNA polymerase structure with that of the homologous pol I family of DNA polymerases reveals identities in the catalytic site but also differences specific to RNA polymerase function. The structure of T7 RNA polymerase presented here differs significantly from a previously published structure. Sequence similarities between phage RNA polymerases and those from mitochondria and chloroplasts, when interpreted in the context of our revised model of T7 RNA polymerase, suggest a conserved fold. PMID:9670025

  15. A trait stacking system via intra-genomic homologous recombination.

    PubMed

    Kumar, Sandeep; Worden, Andrew; Novak, Stephen; Lee, Ryan; Petolino, Joseph F

    2016-11-01

    A gene targeting method has been developed, which allows the conversion of 'breeding stacks', containing unlinked transgenes into a 'molecular stack' and thereby circumventing the breeding challenges associated with transgene segregation. A gene targeting method has been developed for converting two unlinked trait loci into a single locus transgene stack. The method utilizes intra-genomic homologous recombination (IGHR) between stably integrated target and donor loci which share sequence homology and nuclease cleavage sites whereby the donor contains a promoterless herbicide resistance transgene. Upon crossing with a zinc finger nuclease (ZFN)-expressing plant, double-strand breaks (DSB) are created in both the stably integrated target and donor loci. DSBs flanking the donor locus result in intra-genomic mobilization of a promoterless selectable marker-containing donor sequence, which can be utilized as a template for homology-directed repair of a concomitant DSB at the target locus resulting in a functional selectable marker via nuclease-mediated cassette exchange (NMCE). The method was successfully demonstrated in maize using a glyphosate tolerance gene as a donor whereby up to 3.3 % of the resulting progeny embryos cultured on selection medium regenerated plants with the donor sequence integrated into the target locus. The process could be extended to multiple cycles of trait stacking by virtue of a unique intron sequence homology for NMCE between the target and the donor loci. This is the first report that describes NMCE via IGHR, thereby enabling trait stacking using conventional crossing.

  16. Better Understanding of Homologous Recombination through a 12-Week Laboratory Course for Undergraduates Majoring in Biotechnology

    ERIC Educational Resources Information Center

    Li, Ming; Shen, Xiaodong; Zhao, Yan; Hu, Xiaomei; Hu, Fuquan; Rao, Xiancai

    2017-01-01

    Homologous recombination, a central concept in biology, is defined as the exchange of DNA strands between two similar or identical nucleotide sequences. Unfortunately, undergraduate students majoring in biotechnology often experience difficulties in understanding the molecular basis of homologous recombination. In this study, we developed and…

  17. The three-dimensional structure of "Lonely Guy" from Claviceps purpurea provides insights into the phosphoribohydrolase function of Rossmann fold-containing lysine decarboxylase-like proteins.

    PubMed

    Dzurová, Lenka; Forneris, Federico; Savino, Simone; Galuszka, Petr; Vrabka, Josef; Frébort, Ivo

    2015-08-01

    The recently discovered cytokinin (CK)-specific phosphoribohydrolase "Lonely Guy" (LOG) is a key enzyme of CK biosynthesis, converting inactive CK nucleotides into biologically active free bases. We have determined the crystal structures of LOG from Claviceps purpurea (cpLOG) and its complex with the enzymatic product phosphoribose. The structures reveal a dimeric arrangement of Rossmann folds, with the ligands bound to large pockets at the interface between cpLOG monomers. Structural comparisons highlight the homology of cpLOG to putative lysine decarboxylases. Extended sequence analysis enabled identification of a distinguishing LOG sequence signature. Taken together, our data suggest phosphoribohydrolase activity for several proteins of unknown function. © 2015 Wiley Periodicals, Inc.

  18. Domain structure, GTP-hydrolyzing activity and 7S RNA binding of Acidianus ambivalens ffh-homologous protein suggest an SRP-like complex in archaea.

    PubMed

    Moll, R; Schmidtke, S; Schäfer, G

    1999-01-01

    In this study we provide, for the first time, experimental evidence that a protein homologous to bacterial Ffh is part of an SRP-like ribonucleoprotein complex in hyperthermophilic archaea. The gene encoding the Ffh homologue in the hyperthermophilic archaeote Acidianus ambivalens has been cloned and sequenced. Recombinant Ffh protein was expressed in E. coli and subjected to biochemical and functional studies. A. ambivalens Ffh encodes a 50.4-kDa protein that is structured by three distinct regions: the N-terminal hydrophilic N-region (N), the GTP/GDP-binding domain (G) and a C-terminal located C-domain (C). The A. ambivalens Ffh sequence shares 44-46% sequence similarity with Ffh of methanogenic archaea, 34-36% similarity with eukaryal SRP54 and 30-34% similarity with bacterial Ffh. A polyclonal antiserum raised against the first two domains of A. ambivalens Ffh reacts specifically with a single protein (apparent molecular mass: 46 kDa, termed p46) present in cytosolic and in plasmamembrane cell fractions of A. ambivalens. Recombinant Ffh has a melting point of tm = 89 degreesC. Its intrinsic GTPase activity obviously depends on neutral pH and low ionic strength with a preference for chloride and acetate salts. Highest rates of GTP hydrolysis have been achieved at 81 degreesC in presence of 0.1-1 mm Mg2+. GTP hydrolysis is significantly inhibited by high glycerol concentrations, and the GTP hydrolysis rate also markedly decreases by addition of detergents. The Km for GTP is 13.7 microm at 70 degreesC and GTP hydrolysis is strongly inhibited by GDP (Ki = 8 microm). A. ambivalens Ffh, which includes an RNA-binding motif in the C-terminal domain, is shown to bind specifically to 7S RNA of the related crenarchaeote Sulfolobus solfataricus. Comparative sequence analysis reveals the presence of typical signal sequences in plasma membrane as well as extracellular proteins of hyperthermophilic crenarchaea which strongly supposes recognition events by an Ffh containing SRP-like particle in these organisms.

  19. Bloom DNA Helicase Facilitates Homologous Recombination between Diverged Homologous Sequences*

    PubMed Central

    Kikuchi, Koji; Abdel-Aziz, H. Ismail; Taniguchi, Yoshihito; Yamazoe, Mitsuyoshi; Takeda, Shunichi; Hirota, Kouji

    2009-01-01

    Bloom syndrome caused by inactivation of the Bloom DNA helicase (Blm) is characterized by increases in the level of sister chromatid exchange, homologous recombination (HR) associated with cross-over. It is therefore believed that Blm works as an anti-recombinase. Meanwhile, in Drosophila, DmBlm is required specifically to promote the synthesis-dependent strand anneal (SDSA), a type of HR not associating with cross-over. However, conservation of Blm function in SDSA through higher eukaryotes has been a matter of debate. Here, we demonstrate the function of Blm in SDSA type HR in chicken DT40 B lymphocyte line, where Ig gene conversion diversifies the immunoglobulin V gene through intragenic HR between diverged homologous segments. This reaction is initiated by the activation-induced cytidine deaminase enzyme-mediated uracil formation at the V gene, which in turn converts into abasic site, presumably leading to a single strand gap. Ig gene conversion frequency was drastically reduced in BLM−/− cells. In addition, BLM−/− cells used limited donor segments harboring higher identity compared with other segments in Ig gene conversion event, suggesting that Blm can promote HR between diverged sequences. To further understand the role of Blm in HR between diverged homologous sequences, we measured the frequency of gene targeting induced by an I-SceI-endonuclease-mediated double-strand break. BLM−/− cells showed a severer defect in the gene targeting frequency as the number of heterologous sequences increased at the double-strand break site. Conversely, the overexpression of Blm, even an ATPase-defective mutant, strongly stimulated gene targeting. In summary, Blm promotes HR between diverged sequences through a novel ATPase-independent mechanism. PMID:19661064

  20. Comparative analyses of putative toxin gene homologs from an Old World viper, Daboia russelii

    PubMed Central

    Krishnan, Neeraja M.

    2017-01-01

    Availability of snake genome sequences has opened up exciting areas of research on comparative genomics and gene diversity. One of the challenges in studying snake genomes is the acquisition of biological material from live animals, especially from the venomous ones, making the process cumbersome and time-consuming. Here, we report comparative sequence analyses of putative toxin gene homologs from Russell’s viper (Daboia russelii) using whole-genome sequencing data obtained from shed skin. When compared with the major venom proteins in Russell’s viper studied previously, we found 45–100% sequence similarity between the venom proteins and their putative homologs in the skin. Additionally, comparative analyses of 20 putative toxin gene family homologs provided evidence of unique sequence motifs in nerve growth factor (NGF), platelet derived growth factor (PDGF), Kunitz/Bovine pancreatic trypsin inhibitor (Kunitz BPTI), cysteine-rich secretory proteins, antigen 5, andpathogenesis-related1 proteins (CAP) and cysteine-rich secretory protein (CRISP). In those derived proteins, we identified V11 and T35 in the NGF domain; F23 and A29 in the PDGF domain; N69, K2 and A5 in the CAP domain; and Q17 in the CRISP domain to be responsible for differences in the largest pockets across the protein domain structures in crotalines, viperines and elapids from the in silico structure-based analysis. Similarly, residues F10, Y11 and E20 appear to play an important role in the protein structures across the kunitz protein domain of viperids and elapids. Our study highlights the usefulness of shed skin in obtaining good quality high-molecular weight DNA for comparative genomic studies, and provides evidence towards the unique features and evolution of putative venom gene homologs in vipers. PMID:29230357

  1. The site-specific ribosomal insertion element type II of Bombyx mori (R2Bm) contains the coding sequence for a reverse transcriptase-like enzyme.

    PubMed Central

    Burke, W D; Calalang, C C; Eickbush, T H

    1987-01-01

    Two classes of DNA elements interrupt a fraction of the rRNA repeats of Bombyx mori. We have analyzed by genomic blotting and sequence analysis one class of these elements which we have named R2. These elements occupy approximately 9% of the rDNA units of B. mori and appear to be homologous to the type II rDNA insertions detected in Drosophila melanogaster. Approximately 25 copies of R2 exist within the B. mori genome, of which at least 20 are located at a precise location within otherwise typical rDNA units. Nucleotide sequence analysis has revealed that the 4.2-kilobase-pair R2 element has a single large open reading frame, occupying over 82% of the total length of the element. The central region of this 1,151-amino-acid open reading frame shows homology to the reverse transcriptase enzymes found in retroviruses and certain transposable elements. Amino acid homology of this region is highest to the mobile line 1 elements of mammals, followed by the mitochondrial type II introns of fungi, and the pol gene of retroviruses. Less homology exists with transposable elements of D. melanogaster and Saccharomyces cerevisiae. Two additional regions of sequence homology between L1 and R2 elements were also found outside the reverse transcriptase region. We suggest that the R2 elements are retrotransposons that are site specific in their insertion into the genome. Such mobility would enable these elements to occupy a small fraction of the rDNA units of B. mori despite their continual elimination from the rDNA locus by sequence turnover. Images PMID:2439905

  2. Clustering of Genetically Defined Allele Classes in the Caenorhabditis elegans DAF-2 Insulin/IGF-1 Receptor

    PubMed Central

    Patel, Dhaval S.; Garza-Garcia, Acely; Nanji, Manoj; McElwee, Joshua J.; Ackerman, Daniel; Driscoll, Paul C.; Gems, David

    2008-01-01

    The DAF-2 insulin/IGF-1 receptor regulates development, metabolism, and aging in the nematode Caenorhabditis elegans. However, complex differences among daf-2 alleles complicate analysis of this gene. We have employed epistasis analysis, transcript profile analysis, mutant sequence analysis, and homology modeling of mutant receptors to understand this complexity. We define an allelic series of nonconditional daf-2 mutants, including nonsense and deletion alleles, and a putative null allele, m65. The most severe daf-2 alleles show incomplete suppression by daf-18(0) and daf-16(0) and have a range of effects on early development. Among weaker daf-2 alleles there exist distinct mutant classes that differ in epistatic interactions with mutations in other genes. Mutant sequence analysis (including 11 newly sequenced alleles) reveals that class 1 mutant lesions lie only in certain extracellular regions of the receptor, while class 2 (pleiotropic) and nonconditional missense mutants have lesions only in the ligand-binding pocket of the receptor ectodomain or the tyrosine kinase domain. Effects of equivalent mutations on the human insulin receptor suggest an altered balance of intracellular signaling in class 2 alleles. These studies consolidate and extend our understanding of the complex genetics of daf-2 and its underlying molecular biology. PMID:18245374

  3. Sequence analysis of malacoherpesvirus proteins: Pan-herpesvirus capsid module and replication enzymes with an ancient connection to "Megavirales".

    PubMed

    Mushegian, Arcady; Karin, Eli Levy; Pupko, Tal

    2018-01-01

    The order Herpesvirales includes animal viruses with large double-strand DNA genomes replicating in the nucleus. The main capsid protein in the best-studied family Herpesviridae contains a domain with HK97-like fold related to bacteriophage head proteins, and several virion maturation factors are also homologous between phages and herpesviruses. The origin of herpesvirus DNA replication proteins is less well understood. While analyzing the genomes of herpesviruses in the family Malacohepresviridae, we identified nearly 30 families of proteins conserved in other herpesviruses, including several phage-related domains in morphogenetic proteins. Herpesvirus DNA replication factors have complex evolutionary history: some are related to cellular proteins, but others are closer to homologs from large nucleocytoplasmic DNA viruses. Phylogenetic analyses suggest that the core replication machinery of herpesviruses may have been recruited from the same pool as in the case of other large DNA viruses of eukaryotes. Published by Elsevier Inc.

  4. Molecular cloning and nucleotide sequence of a transforming gene detected by transfection of chicken B-cell lymphoma DNA

    NASA Astrophysics Data System (ADS)

    Goubin, Gerard; Goldman, Debra S.; Luce, Judith; Neiman, Paul E.; Cooper, Geoffrey M.

    1983-03-01

    A transforming gene detected by transfection of chicken B-cell lymphoma DNA has been isolated by molecular cloning. It is homologous to a conserved family of sequences present in normal chicken and human DNAs but is not related to transforming genes of acutely transforming retroviruses. The nucleotide sequence of the cloned transforming gene suggests that it encodes a protein that is partially homologous to the amino terminus of transferrin and related proteins although only about one tenth the size of transferrin.

  5. EUGÈNE'HOM: a generic similarity-based gene finder using multiple homologous sequences

    PubMed Central

    Foissac, Sylvain; Bardou, Philippe; Moisan, Annick; Cros, Marie-Josée; Schiex, Thomas

    2003-01-01

    EUGÈNE'HOM is a gene prediction software for eukaryotic organisms based on comparative analysis. EUGÈNE'HOM is able to take into account multiple homologous sequences from more or less closely related organisms. It integrates the results of TBLASTX analysis, splice site and start codon prediction and a robust coding/non-coding probabilistic model which allows EUGÈNE'HOM to handle sequences from a variety of organisms. The current target of EUGÈNE'HOM is plant sequences. The EUGÈNE'HOM web site is available at http://genopole.toulouse.inra.fr/bioinfo/eugene/EuGeneHom/cgi-bin/EuGeneHom.pl. PMID:12824408

  6. Developmental rearrangement of cyanobacterial nif genes: nucleotide sequence, open reading frames, and cytochrome P-450 homology of the Anabaena sp. strain PCC 7120 nifD element.

    PubMed Central

    Lammers, P J; McLaughlin, S; Papin, S; Trujillo-Provencio, C; Ryncarz, A J

    1990-01-01

    An 11-kbp DNA element of unknown function interrupts the nifD gene in vegetative cells of Anabaena sp. strain PCC 7120. In developing heterocysts the nifD element excises from the chromosome via site-specific recombination between short repeat sequences that flank the element. The nucleotide sequence of the nifH-proximal half of the element was determined to elucidate the genetic potential of the element. Four open reading frames with the same relative orientation as the nifD element-encoded xisA gene were identified in the sequenced region. Each of the open reading frames was preceded by a reasonable ribosome-binding site and had biased codon utilization preferences consistent with low levels of expression. Open reading frame 3 was highly homologous with three cytochrome P-450 omega-hydroxylase proteins and showed regional homology to functionally significant domains common to the cytochrome P-450 superfamily. The sequence encoding open reading frame 2 was the most highly conserved portion of the sequenced region based on heterologous hybridization experiments with three genera of heterocystous cyanobacteria. Images PMID:2123860

  7. Characterization and mapping of cDNA encoding aspartate aminotransferase in rice, Oryza sativa L.

    PubMed

    Song, J; Yamamoto, K; Shomura, A; Yano, M; Minobe, Y; Sasaki, T

    1996-10-31

    Fifteen cDNA clones, putatively identified as encoding aspartate aminotransferase (AST, EC 2.6.1.1.), were isolated and partially sequenced. Together with six previously isolated clones putatively identified to encode ASTs (Sasaki, et al. 1994, Plant Journal 6, 615-624), their sequences were characterized and classified into 4 cDNA species. Two of the isolated clones, C60213 and C2079, were full-length cDNAs, and their complete nucleotide sequences were determined. C60213 was 1612 bp long and its deduced amino acid sequence showed 88% homology with that of Panicum miliaceum L. mitochondrial AST. The C60213-encoded protein had an N-terminal amino acid sequence that was characteristic of a mitochondrial transit peptide. On the other hand, C2079 was 1546 bp long and had 91% amino acid sequence homology with P. miliaceum L. cytosolic AST but lacked in the transit peptide sequence. The homologies of nucleotide sequences and deduced amino acid sequences of C2079 and C60213 were 54% and 52%, respectively. C2079 and C60213 were mapped on chromosomes 1 and 6, respectively, by restriction fragment length polymorphism linkage analysis. Northern blot analysis using C2079 as a probe revealed much higher transcript levels in callus and root than in green and etiolated shoots, suggesting tissue-specific variations of AST gene expression.

  8. Echinococcus granulosus Sensu Stricto in Dogs and Jackals from Caspian Sea Region, Northern Iran

    PubMed Central

    GHOLAMI, Shirzad; JAHANDAR, Hefzallah; ABASTABAR, Mahdi; PAGHEH, Abdolsatar; MOBEDI, Iraj; SHARBATKHORI, Mitra

    2016-01-01

    Background: The aim of the present study was genotyping of Echinococcus granulosus isolates from dogs and jackals in Mazandaran Province, northern Iran, and using partial sequence of the mitochondrial cytochrome c oxidase subunit 1 gene (cox1). Methods: E. granulosus isolates (n = 15) were collected from 42 stray dogs and 16 jackals found in south of the Caspian Sea in northern Iran. After morphological study, the isolates were genetically characterized using consensus sequences (366bp) of the cox1 gene. Phylogenetic analysis of cox1 nucleotide sequence data was performed using a Bayesian Inference approach. Results: Four different sequences were observed among the isolates. Two genotypes [G1 (66.7%) and G3 (33.3%)] were identified among the isolates. The G1 sequences indicated three sequence profiles. One profile (Maz1) had 100% homology with reference sequence (AN: KP339045). Two other profiles, designated Maz2 and Maz3, had 99% homology with the G1 genotype (ANs: KP339046 and KP339047). A G3 sequence designated Maz4 showed 100% homology with a G3 reference sequence (AN: KP339048). Conclusion: The occurrence of the G1 genotype of E. granulosus sensu stricto as a frequent genotype in dogs is emphasized. This study established the first molecular characterization of E. granulosus in the province. PMID:28096852

  9. Gene Unprediction with Spurio: A tool to identify spurious protein sequences.

    PubMed

    Höps, Wolfram; Jeffryes, Matt; Bateman, Alex

    2018-01-01

    We now have access to the sequences of tens of millions of proteins. These protein sequences are essential for modern molecular biology and computational biology. The vast majority of protein sequences are derived from gene prediction tools and have no experimental supporting evidence for their translation.  Despite the increasing accuracy of gene prediction tools there likely exists a large number of spurious protein predictions in the sequence databases.  We have developed the Spurio tool to help identify spurious protein predictions in prokaryotes.  Spurio searches the query protein sequence against a prokaryotic nucleotide database using tblastn and identifies homologous sequences. The tblastn matches are used to score the query sequence's likelihood of being a spurious protein prediction using a Gaussian process model. The most informative feature is the appearance of stop codons within the presumed translation of homologous DNA sequences. Benchmarking shows that the Spurio tool is able to distinguish spurious from true proteins. However, transposon proteins are prone to be predicted as spurious because of the frequency of degraded homologs found in the DNA sequence databases. Our initial experiments suggest that less than 1% of the proteins in the UniProtKB sequence database are likely to be spurious and that Spurio is able to identify over 60 times more spurious proteins than the AntiFam resource. The Spurio software and source code is available under an MIT license at the following URL: https://bitbucket.org/bateman-group/spurio.

  10. MollDE: a homology modeling framework you can click with.

    PubMed

    Canutescu, Adrian A; Dunbrack, Roland L

    2005-06-15

    Molecular Integrated Development Environment (MolIDE) is an integrated application designed to provide homology modeling tools and protocols under a uniform, user-friendly graphical interface. Its main purpose is to combine the most frequent modeling steps in a semi-automatic, interactive way, guiding the user from the target protein sequence to the final three-dimensional protein structure. The typical basic homology modeling process is composed of building sequence profiles of the target sequence family, secondary structure prediction, sequence alignment with PDB structures, assisted alignment editing, side-chain prediction and loop building. All of these steps are available through a graphical user interface. MolIDE's user-friendly and streamlined interactive modeling protocol allows the user to focus on the important modeling questions, hiding from the user the raw data generation and conversion steps. MolIDE was designed from the ground up as an open-source, cross-platform, extensible framework. This allows developers to integrate additional third-party programs to MolIDE. http://dunbrack.fccc.edu/molide/molide.php rl_dunbrack@fccc.edu.

  11. The LINE-1 DNA sequences in four mammalian orders predict proteins that conserve homologies to retrovirus proteins.

    PubMed Central

    Fanning, T; Singer, M

    1987-01-01

    Recent work suggests that one or more members of the highly repeated LINE-1 (L1) DNA family found in all mammals may encode one or more proteins. Here we report the sequence of a portion of an L1 cloned from the domestic cat (Felis catus). These data permit comparison of the L1 sequences in four mammalian orders (Carnivore, Lagomorph, Rodent and Primate) and the comparison supports the suggested coding potential. In two separate, noncontiguous regions in the carboxy terminal half of the proteins predicted from the DNA sequences, there are several strongly conserved segments. In one region, these share homology with known or suspected reverse transcriptases, as described by others in rodents and primates. In the second region, closer to the carboxy terminus, the strongly conserved segments are over 90% homologous among the four orders. One of the latter segments is cysteine rich and resembles the putative metal binding domains of nucleic acid binding proteins, including those of TFIIIA and retroviruses. PMID:3562227

  12. The N-terminal sequence of ribosomal protein L10 from the archaebacterium Halobacterium marismortui and its relationship to eubacterial protein L6 and other ribosomal proteins.

    PubMed

    Dijk, J; van den Broek, R; Nasiulas, G; Beck, A; Reinhardt, R; Wittmann-Liebold, B

    1987-08-01

    The amino-terminal sequence of ribosomal protein L10 from Halobacterium marismortui has been determined up to residue 54, using both a liquid- and a gas-phase sequenator. The two sequences are in good agreement. The protein is clearly homologous to protein HcuL10 from the related strain Halobacterium cutirubrum. Furthermore, a weaker but distinct homology to ribosomal protein L6 from Escherichia coli and Bacillus stearothermophilus can be detected. In addition to 7 identical amino acids in the first 36 residues in all four sequences a number of conservative replacements occurs, of mainly hydrophobic amino acids. In this common region the pattern of conserved amino acids suggests the presence of a beta-alpha fold as it occurs in ribosomal proteins L12 and L30. Furthermore, several potential cases of homology to other ribosomal components of the three ur-kingdoms have been found.

  13. Molecular characterization of chikungunya virus from Andhra Pradesh, India & phylogenetic relationship with Central African isolates.

    PubMed

    M Naresh Kumar, C V; Anthony Johnson, A M; R Sai Gopal, D V

    2007-12-01

    Chikungunya virus has caused numerous large outbreaks in India. Suspected blood samples from the epidemic were collected and characterized for the identification of the responsible causative from Rayalaseema region of Andhra Pradesh. RT-PCR was used for screening of suspected blood samples. Primers were designed to amplify partial E1 gene and the amplified fragment was cloned and sequenced. The sequence was analyzed and compared with other geographical isolates to find the phylogenetic relationship. The sequence was submitted to the Gen bank DNA database (accession DQ888620). Comparative nucleotide homology analysis of the AP Ra-CTR isolate with the other isolates revealed 94.7+/-3.6 per cent of homology of CHIKAPRa-CTR with other isolates of Chikungunya virus at nucleotide level and 96.8+/-3.2 per cent of homology at amino acid level. The current epidemic was caused by the Central African genotype of CHIKV, grouped in Central Africa cluster in phylogenetic trees generated based on nucleotide and amino acid sequences.

  14. DNA sequences of three beta-1,4-endoglucanase genes from Thermomonospora fusca.

    PubMed Central

    Lao, G; Ghangas, G S; Jung, E D; Wilson, D B

    1991-01-01

    The DNA sequences of the Thermomonospora fusca genes encoding cellulases E2 and E5 and the N-terminal end of E4 were determined. Each sequence contains an identical 14-bp inverted repeat upstream of the initiation codon. There were no significant homologies between the coding regions of the three genes. The E2 gene is 73% identical to the celA gene from Microbispora bispora, but this was the only homology found with other cellulase genes. E2 belongs to a family of cellulases that includes celA from M. bispora, cenA from Cellulomonas fimi, casA from an alkalophilic Streptomyces strain, and cellobiohydrolase II from Trichoderma reesei. E4 shows 44% identity to an avocado cellulase, while E5 belongs to the Bacillus cellulase family. There were strong similarities between the amino acid sequences of the E2 and E5 cellulose binding domains, and these regions also showed homology with C. fimi and Pseudomonas fluorescens cellulose binding domains. PMID:1904434

  15. Msh2 Blocks an Alternative Mechanism for Non-Homologous Tail Removal during Single-Strand Annealing in Saccharomyces cerevisiae

    PubMed Central

    Manthey, Glenn M.; Naik, Nilan; Bailis, Adam M.

    2009-01-01

    Chromosomal translocations are frequently observed in cells exposed to agents that cause DNA double-strand breaks (DSBs), such as ionizing radiation and chemotherapeutic drugs, and are often associated with tumors in mammals. Recently, translocation formation in the budding yeast, Saccharomyces cerevisiae, has been found to occur at high frequencies following the creation of multiple DSBs adjacent to repetitive sequences on non-homologous chromosomes. The genetic control of translocation formation and the chromosome complements of the clones that contain translocations suggest that translocation formation occurs by single-strand annealing (SSA). Among the factors important for translocation formation by SSA is the central mismatch repair (MMR) and homologous recombination (HR) factor, Msh2. Here we describe the effects of several msh2 missense mutations on translocation formation that suggest that Msh2 has separable functions in stabilizing annealed single strands, and removing non-homologous sequences from their ends. Additionally, interactions between the msh2 alleles and a null allele of RAD1, which encodes a subunit of a nuclease critical for the removal of non-homologous tails suggest that Msh2 blocks an alternative mechanism for removing these sequences. These results suggest that Msh2 plays multiple roles in the formation of chromosomal translocations following acute levels of DNA damage. PMID:19834615

  16. Global transformation of erythrocyte properties via engagement of an SH2-like sequence in band 3

    PubMed Central

    Turrini, Francesco M.; Li, Yen-Hsing; Low, Philip S.

    2016-01-01

    Src homology 2 (SH2) domains are composed of weakly conserved sequences of ∼100 aa that bind phosphotyrosines in signaling proteins and thereby mediate intra- and intermolecular protein–protein interactions. In exploring the mechanism whereby tyrosine phosphorylation of the erythrocyte anion transporter, band 3, triggers membrane destabilization, vesiculation, and fragmentation, we discovered a SH2 signature motif positioned between membrane-spanning helices 4 and 5. Evidence that this exposed cytoplasmic sequence contributes to a functional SH2-like domain is provided by observations that: (i) it contains the most conserved sequence of SH2 domains, GSFLVR; (ii) it binds the tyrosine phosphorylated cytoplasmic domain of band 3 (cdb3-PO4) with Kd = 14 nM; (iii) binding of cdb3-PO4 to erythrocyte membranes is inhibited both by antibodies against the SH2 signature sequence and dephosphorylation of cdb3-PO4; (iv) label transfer experiments demonstrate the covalent transfer of photoactivatable biotin from isolated cdb3-PO4 (but not cdb3) to band 3 in erythrocyte membranes; and (v) phosphorylation-induced binding of cdb3-PO4 to the membrane-spanning domain of band 3 in intact cells causes global changes in membrane properties, including (i) displacement of a glycolytic enzyme complex from the membrane, (ii) inhibition of anion transport, and (iii) rupture of the band 3–ankyrin bridge connecting the spectrin-based cytoskeleton to the membrane. Because SH2-like motifs are not retrieved by normal homology searches for SH2 domains, but can be found in many tyrosine kinase-regulated transport proteins using modified search programs, we suggest that related cases of membrane transport proteins containing similar motifs are widespread in nature where they participate in regulation of cell properties. PMID:27856737

  17. Genome-wide discovery of novel and conserved microRNAs in white shrimp (Litopenaeus vannamei).

    PubMed

    Xi, Qian-Yun; Xiong, Yuan-Yan; Wang, Yuan-Mei; Cheng, Xiao; Qi, Qi-En; Shu, Gang; Wang, Song-Bo; Wang, Li-Na; Gao, Ping; Zhu, Xiao-Tong; Jiang, Qing-Yan; Zhang, Yong-Liang; Liu, Li

    2015-01-01

    Of late years, a large amount of conserved and species-specific microRNAs (miRNAs) have been performed on identification from species which are economically important but lack a full genome sequence. In this study, Solexa deep sequencing and cross-species miRNA microarray were used to detect miRNAs in white shrimp. We identified 239 conserved miRNAs, 14 miRNA* sequences and 20 novel miRNAs by bioinformatics analysis from 7,561,406 high-quality reads representing 325,370 distinct sequences. The all 20 novel miRNAs were species-specific in white shrimp and not homologous in other species. Using the conserved miRNAs from the miRBase database as a query set to search for homologs from shrimp expressed sequence tags (ESTs), 32 conserved computationally predicted miRNAs were discovered in shrimp. In addition, using microarray analysis in the shrimp fed with Panax ginseng polysaccharide complex, 151 conserved miRNAs were identified, 18 of which were significant up-expression, while 49 miRNAs were significant down-expression. In particular, qRT-PCR analysis was also performed for nine miRNAs in three shrimp tissues such as muscle, gill and hepatopancreas. Results showed that these miRNAs expression are tissue specific. Combining results of the three methods, we detected 20 novel and 394 conserved miRNAs. Verification with quantitative reverse transcription (qRT-PCR) and Northern blot showed a high confidentiality of data. The study provides the first comprehensive specific miRNA profile of white shrimp, which includes useful information for future investigations into the function of miRNAs in regulation of shrimp development and immunology.

  18. dbWFA: a web-based database for functional annotation of Triticum aestivum transcripts

    PubMed Central

    Vincent, Jonathan; Dai, Zhanwu; Ravel, Catherine; Choulet, Frédéric; Mouzeyar, Said; Bouzidi, M. Fouad; Agier, Marie; Martre, Pierre

    2013-01-01

    The functional annotation of genes based on sequence homology with genes from model species genomes is time-consuming because it is necessary to mine several unrelated databases. The aim of the present work was to develop a functional annotation database for common wheat Triticum aestivum (L.). The database, named dbWFA, is based on the reference NCBI UniGene set, an expressed gene catalogue built by expressed sequence tag clustering, and on full-length coding sequences retrieved from the TriFLDB database. Information from good-quality heterogeneous sources, including annotations for model plant species Arabidopsis thaliana (L.) Heynh. and Oryza sativa L., was gathered and linked to T. aestivum sequences through BLAST-based homology searches. Even though the complexity of the transcriptome cannot yet be fully appreciated, we developed a tool to easily and promptly obtain information from multiple functional annotation systems (Gene Ontology, MapMan bin codes, MIPS Functional Categories, PlantCyc pathway reactions and TAIR gene families). The use of dbWFA is illustrated here with several query examples. We were able to assign a putative function to 45% of the UniGenes and 81% of the full-length coding sequences from TriFLDB. Moreover, comparison of the annotation of the whole T. aestivum UniGene set along with curated annotations of the two model species assessed the accuracy of the annotation provided by dbWFA. To further illustrate the use of dbWFA, genes specifically expressed during the early cell division or late storage polymer accumulation phases of T. aestivum grain development were identified using a clustering analysis and then annotated using dbWFA. The annotation of these two sets of genes was consistent with previous analyses of T. aestivum grain transcriptomes and proteomes. Database URL: urgi.versailles.inra.fr/dbWFA/ PMID:23660284

  19. Global transformation of erythrocyte properties via engagement of an SH2-like sequence in band 3.

    PubMed

    Puchulu-Campanella, Estela; Turrini, Francesco M; Li, Yen-Hsing; Low, Philip S

    2016-11-29

    Src homology 2 (SH2) domains are composed of weakly conserved sequences of ∼100 aa that bind phosphotyrosines in signaling proteins and thereby mediate intra- and intermolecular protein-protein interactions. In exploring the mechanism whereby tyrosine phosphorylation of the erythrocyte anion transporter, band 3, triggers membrane destabilization, vesiculation, and fragmentation, we discovered a SH2 signature motif positioned between membrane-spanning helices 4 and 5. Evidence that this exposed cytoplasmic sequence contributes to a functional SH2-like domain is provided by observations that: (i) it contains the most conserved sequence of SH2 domains, GSFLVR; (ii) it binds the tyrosine phosphorylated cytoplasmic domain of band 3 (cdb3-PO 4 ) with K d = 14 nM; (iii) binding of cdb3-PO 4 to erythrocyte membranes is inhibited both by antibodies against the SH2 signature sequence and dephosphorylation of cdb3-PO 4 ; (iv) label transfer experiments demonstrate the covalent transfer of photoactivatable biotin from isolated cdb3-PO 4 (but not cdb3) to band 3 in erythrocyte membranes; and (v) phosphorylation-induced binding of cdb3-PO 4 to the membrane-spanning domain of band 3 in intact cells causes global changes in membrane properties, including (i) displacement of a glycolytic enzyme complex from the membrane, (ii) inhibition of anion transport, and (iii) rupture of the band 3-ankyrin bridge connecting the spectrin-based cytoskeleton to the membrane. Because SH2-like motifs are not retrieved by normal homology searches for SH2 domains, but can be found in many tyrosine kinase-regulated transport proteins using modified search programs, we suggest that related cases of membrane transport proteins containing similar motifs are widespread in nature where they participate in regulation of cell properties.

  20. Rapid Hypothesis Testing with Candida albicans through Gene Disruption with Short Homology Regions

    PubMed Central

    Wilson, R. Bryce; Davis, Dana; Mitchell, Aaron P.

    1999-01-01

    Disruption of newly identified genes in the pathogen Candida albicans is a vital step in determination of gene function. Several gene disruption methods described previously employ long regions of homology flanking a selectable marker. Here, we describe disruption of C. albicans genes with PCR products that have 50 to 60 bp of homology to a genomic sequence on each end of a selectable marker. We used the method to disrupt two known genes, ARG5 and ADE2, and two sequences newly identified through the Candida genome project, HRM101 and ENX3. HRM101 and ENX3 are homologous to genes in the conserved RIM101 (previously called RIM1) and PacC pathways of Saccharomyces cerevisiae and Aspergillus nidulans. We show that three independent hrm101/hrm101 mutants and two independent enx3/enx3 mutants are defective in filamentation on Spider medium. These observations argue that HRM101 and ENX3 sequences are indeed portions of genes and that the respective gene products have related functions. PMID:10074081

  1. Conservation of the glycoprotein B homologs of the Kaposi’s sarcoma-associated herpesvirus (KSHV/HHV8) and Old World primate rhadinoviruses of chimpanzees and macaques

    PubMed Central

    Bruce, A. Gregory; Horst, Jeremy A.; Rose, Timothy M.

    2016-01-01

    The envelope-associated glycoprotein B (gB) is highly conserved within the Herpesviridae and plays a critical role in viral entry. We analyzed the evolutionary conservation of sequence and structural motifs within the Kaposi’s sarcoma-associated herpesvirus (KSHV) gB and homologs of Old World primate rhadinoviruses belonging to the distinct RV1 and RV2 rhadinovirus lineages. In addition to gB homologs of rhadinoviruses infecting the pig-tailed and rhesus macaques, we cloned and sequenced gB homologs of RV1 and RV2 rhadinoviruses infecting chimpanzees. A structural model of the KSHV gB was determined, and functional motifs and sequence variants were mapped to the model structure. Conserved domains and motifs were identified, including an “RGD” motif that plays a critical role in KSHV binding and entry through the cellular integrin αVβ3. The RGD motif was only detected in RV1 rhadinoviruses suggesting an important difference in cell tropism between the two rhadinovirus lineages. PMID:27070755

  2. Advances in Homology Protein Structure Modeling

    PubMed Central

    Xiang, Zhexin

    2007-01-01

    Homology modeling plays a central role in determining protein structure in the structural genomics project. The importance of homology modeling has been steadily increasing because of the large gap that exists between the overwhelming number of available protein sequences and experimentally solved protein structures, and also, more importantly, because of the increasing reliability and accuracy of the method. In fact, a protein sequence with over 30% identity to a known structure can often be predicted with an accuracy equivalent to a low-resolution X-ray structure. The recent advances in homology modeling, especially in detecting distant homologues, aligning sequences with template structures, modeling of loops and side chains, as well as detecting errors in a model, have contributed to reliable prediction of protein structure, which was not possible even several years ago. The ongoing efforts in solving protein structures, which can be time-consuming and often difficult, will continue to spur the development of a host of new computational methods that can fill in the gap and further contribute to understanding the relationship between protein structure and function. PMID:16787261

  3. Acid sphingomyelinase possesses a domain homologous to its activator proteins: saposins B and D.

    PubMed Central

    Ponting, C. P.

    1994-01-01

    An N-terminal region of the acid sphingomyelinase sequence (residues 89-165) is shown to be homologous to saposin-type sequences. By analogy with the known functions of saposins, this sphingomyelinase saposin-type domain may possess lipid-binding and/or sphingomyelinase-activator properties. This finding may prove to be important in the understanding of Niemann-Pick disease, which results from sphingomyelinase deficiency. PMID:8003971

  4. Molecular Phylogenetics of Trichostrongylus Species (Nematoda: Trichostrongylidae) from Humans of Mazandaran Province, Iran.

    PubMed

    Sharifdini, Meysam; Heidari, Zahra; Hesari, Zahra; Vatandoost, Sajad; Kia, Eshrat Beigom

    2017-06-01

    The present study was performed to analyze molecularly the phylogenetic positions of human-infecting Trichostrongylus species in Mazandaran Province, Iran, which is an endemic area for trichostrongyliasis. DNA from 7 Trichostrongylus infected stool samples were extracted by using in-house (IH) method. PCR amplification of ITS2-rDNA region was performed, and products were sequenced. Phylogenetic analysis of the nucleotide sequence data was performed using MEGA 5.0 software. Six out of 7 isolates had high similarity with Trichostrongylus colubriformis , while the other one showed high homology with Trichostrongylus axei registered in GenBank reference sequences. Intra-specific variations within isolates of T. colubriformis and T. axei amounted to 0-1.8% and 0-0.6%, respectively. Trichostrongylus species obtained in the present study were in a cluster with the relevant reference sequences from previous studies. BLAST analysis indicated that there was 100% homology among all 6 ITS2 sequences of T. colubriformis in the present study and most previously registered sequences of T. colubriformis from human, sheep, and goat isolates from Iran and also human isolates from Laos, Thailand, and France. The ITS2 sequence of T. axei exhibited 99.4% homology with the human isolate of T. axei from Thailand, sheep isolates from New Zealand and Iran, and cattle isolate from USA.

  5. Transitive homology-guided structural studies lead to discovery of Cro proteins with 40% sequence identity but different folds

    PubMed Central

    Roessler, Christian G.; Hall, Branwen M.; Anderson, William J.; Ingram, Wendy M.; Roberts, Sue A.; Montfort, William R.; Cordes, Matthew H. J.

    2008-01-01

    Proteins that share common ancestry may differ in structure and function because of divergent evolution of their amino acid sequences. For a typical diverse protein superfamily, the properties of a few scattered members are known from experiment. A satisfying picture of functional and structural evolution in relation to sequence changes, however, may require characterization of a larger, well chosen subset. Here, we employ a “stepping-stone” method, based on transitive homology, to target sequences intermediate between two related proteins with known divergent properties. We apply the approach to the question of how new protein folds can evolve from preexisting folds and, in particular, to an evolutionary change in secondary structure and oligomeric state in the Cro family of bacteriophage transcription factors, initially identified by sequence-structure comparison of distant homologs from phages P22 and λ. We report crystal structures of two Cro proteins, Xfaso 1 and Pfl 6, with sequences intermediate between those of P22 and λ. The domains show 40% sequence identity but differ by switching of α-helix to β-sheet in a C-terminal region spanning ≈25 residues. Sedimentation analysis also suggests a correlation between helix-to-sheet conversion and strengthened dimerization. PMID:18227506

  6. The cytochrome oxidase subunit I and subunit III genes in Oenothera mitochondria are transcribed from identical promoter sequences

    PubMed Central

    Hiesel, Rudolf; Schobel, Werner; Schuster, Wolfgang; Brennicke, Axel

    1987-01-01

    Two loci encoding subunit III of the cytochrome oxidase (COX) in Oenothera mitochondria have been identified from a cDNA library of mitochondrial transcripts. A 657-bp sequence block upstream from the open reading frame is also present in the two copies of the COX subunit I gene and is presumably involved in homologous sequence rearrangement. The proximal points of sequence rearrangements are located 3 bp upstream from the COX I and 1139 bp upstream from the COX III initiation codons. The 5'-termini of both COX I and COX III mRNAs have been mapped in this common sequence confining the promoter region for the Oenothera mitochondrial COX I and COX III genes to the homologous sequence block. ImagesFig. 5. PMID:15981332

  7. Computational allergenicity prediction of transgenic proteins expressed in genetically modified crops.

    PubMed

    Verma, Alok Kumar; Misra, Amita; Subash, Swarna; Das, Mukul; Dwivedi, Premendra D

    2011-09-01

    Development of genetically modified (GM) crops is on increase to improve food quality, increase harvest yields, and reduce the dependency on chemical pesticides. Before their release in marketplace, they should be scrutinized for their safety. Several guidelines of different regulatory agencies like ILSI, WHO Codex, OECD, and so on for allergenicity evaluation of transgenics are available and sequence homology analysis is the first test to determine the allergenic potential of inserted proteins. Therefore, to test and validate, 312 allergenic, 100 non-allergenic, and 48 inserted proteins were assessed for sequence similarity using 8-mer, 80-mer, and full FASTA search. On performing sequence homology studies, ~94% the allergenic proteins gave exact matches for 8-mer and 80-mer homology. However, 20 allergenic proteins showed non-allergenic behavior. Out of 100 non-allergenic proteins, seven qualified as allergens. None of the inserted proteins demonstrated allergenic behavior. In order to improve the predictability, proteins showing anomalous behavior were tested by Algpred and ADFS separately. Use of Algpred and ADFS softwares reduced the tendency of false prediction to a great extent (74-78%). In conclusion, routine sequence homology needs to be coupled with some other bioinformatic method like ADFS/Algpred to reduce false allergenicity prediction of novel proteins.

  8. Modular and configurable optimal sequence alignment software: Cola.

    PubMed

    Zamani, Neda; Sundström, Görel; Höppner, Marc P; Grabherr, Manfred G

    2014-01-01

    The fundamental challenge in optimally aligning homologous sequences is to define a scoring scheme that best reflects the underlying biological processes. Maximising the overall number of matches in the alignment does not always reflect the patterns by which nucleotides mutate. Efficiently implemented algorithms that can be parameterised to accommodate more complex non-linear scoring schemes are thus desirable. We present Cola, alignment software that implements different optimal alignment algorithms, also allowing for scoring contiguous matches of nucleotides in a nonlinear manner. The latter places more emphasis on short, highly conserved motifs, and less on the surrounding nucleotides, which can be more diverged. To illustrate the differences, we report results from aligning 14,100 sequences from 3' untranslated regions of human genes to 25 of their mammalian counterparts, where we found that a nonlinear scoring scheme is more consistent than a linear scheme in detecting short, conserved motifs. Cola is freely available under LPGL from https://github.com/nedaz/cola.

  9. Differential display RT PCR of total RNA from human foreskin fibroblasts for investigation of androgen-dependent gene expression

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nitsche, E.M.; Moquin, A.; Adams, P.S.

    1996-05-03

    Male sexual differentiation is a process that involves androgen action via the androgen receptor. Defects in the androgen receptor, many resulting from point mutations in the androgen receptor gene, lead to varying degrees of impaired masculinization in chromosomally male individuals. To date no specific androgen regulated morphogens involved in this process have been identified and no marker genes are known that would help to predict further virilization in infants with partial androgen insensitivity. In the present study we first show data on androgen regulated gene expression investigated by differential display reverse transcription PCR (dd RT PCR) on total RNA frommore » human neonatal genital skin fibroblasts cultured in the presence or absence of 100 nM testosterone. Using three different primer combinations, 54 cDNAs appeared to be regulated by androgens. Most of these sequences show the characteristics of expressed mRNAs but showed no homology to sequences in the database. However 15 clones with significant homology to previously cloned sequences were identified. Seven cDNAs appear to be induced by androgen withdrawal. Of these, five are similar to ETS (expression tagged sequences) from unknown genes; the other two show significant homology to the cDNAs of ubiquitin and human guanylate binding protein 2 (GBP-2). In addition, we have identified 8 cDNA clones which show homologies to other sequences in the database and appear to be upregulated in the presence of testosterone. Three differential expressed sequences show significant homology to the cDNAs of L-plastin and one to the cDNA of testican. This latter gene codes for a proteoglycan involved in cell social behavior and therefore of special interest in this context. The results of this study are of interest in further investigation of normal and disturbed androgen-dependent gene expression. 49 refs., 2 figs., 5 tabs.« less

  10. orthoFind Facilitates the Discovery of Homologous and Orthologous Proteins.

    PubMed

    Mier, Pablo; Andrade-Navarro, Miguel A; Pérez-Pulido, Antonio J

    2015-01-01

    Finding homologous and orthologous protein sequences is often the first step in evolutionary studies, annotation projects, and experiments of functional complementation. Despite all currently available computational tools, there is a requirement for easy-to-use tools that provide functional information. Here, a new web application called orthoFind is presented, which allows a quick search for homologous and orthologous proteins given one or more query sequences, allowing a recurrent and exhaustive search against reference proteomes, and being able to include user databases. It addresses the protein multidomain problem, searching for homologs with the same domain architecture, and gives a simple functional analysis of the results to help in the annotation process. orthoFind is easy to use and has been proven to provide accurate results with different datasets. Availability: http://www.bioinfocabd.upo.es/orthofind/.

  11. Identification and analysis of multigene families by comparison of exon fingerprints.

    PubMed

    Brown, N P; Whittaker, A J; Newell, W R; Rawlings, C J; Beck, S

    1995-06-02

    Gene families are often recognised by sequence homology using similarity searching to find relationships, however, genomic sequence data provides gene architectural information not used by conventional search methods. In particular, intron positions and phases are expected to be relatively conserved features, because mis-splicing and reading frame shifts should be selected against. A fast search technique capable of detecting possible weak sequence homologies apparent at the intron/exon level of gene organization is presented for comparing spliceosomal genes and gene fragments. FINEX compares strings of exons delimited by intron/exon boundary positions and intron phases (exon fingerprint) using a global dynamic programming algorithm with a combined intron phase identity and exon size dissimilarity score. Exon fingerprints are typically two orders of magnitude smaller than their nucleic acid sequence counterparts giving rise to fast search times: a ranked search against a library of 6755 fingerprints for a typical three exon fingerprint completes in under 30 seconds on an ordinary workstation, while a worst case largest fingerprint of 52 exons completes in just over one minute. The short "sequence" length of exon fingerprints in comparisons is compensated for by the large exon alphabet compounded of intron phase types and a wide range of exon sizes, the latter contributing the most information to alignments. FINEX performs better in some searches than conventional methods, finding matches with similar exon organization, but low sequence homology. A search using a human serum albumin finds all members of the multigene family in the FINEX database at the top of the search ranking, despite very low amino acid percentage identities between family members. The method should complement conventional sequence searching and alignment techniques, offering a means of identifying otherwise hard to detect homologies where genomic data are available.

  12. The Inner Membrane Complex Sub-compartment Proteins Critical for Replication of the Apicomplexan Parasite Toxoplasma gondii Adopt a Pleckstrin Homology Fold*

    PubMed Central

    Tonkin, Michelle L.; Beck, Josh R.; Bradley, Peter J.; Boulanger, Martin J.

    2014-01-01

    Toxoplasma gondii, an apicomplexan parasite prevalent in developed nations, infects up to one-third of the human population. The success of this parasite depends on several unique structures including an inner membrane complex (IMC) that lines the interior of the plasma membrane and contains proteins important for gliding motility and replication. Of these proteins, the IMC sub-compartment proteins (ISPs) have recently been shown to play a role in asexual T. gondii daughter cell formation, yet the mechanism is unknown. Complicating mechanistic characterization of the ISPs is a lack of sequence identity with proteins of known structure or function. In support of elucidating the function of ISPs, we first determined the crystal structures of representative members TgISP1 and TgISP3 to a resolution of 2.10 and 2.32 Å, respectively. Structural analysis revealed that both ISPs adopt a pleckstrin homology fold often associated with phospholipid binding or protein-protein interactions. Substitution of basic for hydrophobic residues in the region that overlays with phospholipid binding in related pleckstrin homology domains, however, suggests that ISPs do not retain phospholipid binding activity. Consistent with this observation, biochemical assays revealed no phospholipid binding activity. Interestingly, mapping of conserved surface residues combined with crystal packing analysis indicates that TgISPs have functionally repurposed the phospholipid-binding site likely to coordinate protein partners. Recruitment of larger protein complexes may also be aided through avidity-enhanced interactions resulting from multimerization of the ISPs. Overall, we propose a model where TgISPs recruit protein partners to the IMC to ensure correct progression of daughter cell formation. PMID:24675080

  13. Role of HERP and a HERP-related protein in HRD1-dependent protein degradation at the endoplasmic reticulum.

    PubMed

    Huang, Chih-Hsiang; Chu, Yue-Ru; Ye, Yihong; Chen, Xin

    2014-02-14

    Misfolded proteins of the endoplasmic reticulum (ER) are retrotranslocated to the cytosol and degraded by the proteasome via a process termed ER-associated degradation (ERAD). The precise mechanism of retrotranslocation is unclear. Here, we use several lumenal ERAD substrates targeted for degradation by the ubiquitin ligase HRD1 including SHH (sonic hedgehog) and NHK (null Hong Kong α1-antitrypsin) to study the geometry, organization, and regulation of the HRD1-containing ERAD machinery. We report a new HRD1-associated membrane protein named HERP2, which is homologous to the previously identified HRD1 partner HERP1. Despite sequence homology, HERP2 is constitutively expressed in cells, whereas HERP1 is highly induced by ER stress. We find that these proteins are required for efficient degradation of both glycosylated and nonglycosylated SHH proteins as well as NHK. In cells depleted of HERPs, SHH proteins are largely trapped inside the ER with a fraction of the stabilized SHH protein bound to the HRD1-SEL1L ligase complex. Ubiquitination of SHH is significantly attenuated in the absence of HERPs, suggesting a defect in retrotranslocation. Both HERP proteins interact with HRD1 through a region located in the cytosol. However, unlike its homolog in Saccharomyces cerevisiae, HERPs do not regulate HRD1 stability or oligomerization status. Instead, they help recruit DERL2 to the HRD1-SEL1L complex. Additionally, the UBL domain of HERP1 also seems to have a function independent of DERL2 recruitment in ERAD. Our studies have revealed a critical scaffolding function for mammalian HERP proteins that is required for forming an active retrotranslocation complex containing HRD1, SEL1L, and DERL2.

  14. A deep learning framework for improving long-range residue-residue contact prediction using a hierarchical strategy.

    PubMed

    Xiong, Dapeng; Zeng, Jianyang; Gong, Haipeng

    2017-09-01

    Residue-residue contacts are of great value for protein structure prediction, since contact information, especially from those long-range residue pairs, can significantly reduce the complexity of conformational sampling for protein structure prediction in practice. Despite progresses in the past decade on protein targets with abundant homologous sequences, accurate contact prediction for proteins with limited sequence information is still far from satisfaction. Methodologies for these hard targets still need further improvement. We presented a computational program DeepConPred, which includes a pipeline of two novel deep-learning-based methods (DeepCCon and DeepRCon) as well as a contact refinement step, to improve the prediction of long-range residue contacts from primary sequences. When compared with previous prediction approaches, our framework employed an effective scheme to identify optimal and important features for contact prediction, and was only trained with coevolutionary information derived from a limited number of homologous sequences to ensure robustness and usefulness for hard targets. Independent tests showed that 59.33%/49.97%, 64.39%/54.01% and 70.00%/59.81% of the top L/5, top L/10 and top 5 predictions were correct for CASP10/CASP11 proteins, respectively. In general, our algorithm ranked as one of the best methods for CASP targets. All source data and codes are available at http://166.111.152.91/Downloads.html . hgong@tsinghua.edu.cn or zengjy321@tsinghua.edu.cn. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  15. Metagenomic ventures into outer sequence space.

    PubMed

    Dutilh, Bas E

    Sequencing DNA or RNA directly from the environment often results in many sequencing reads that have no homologs in the database. These are referred to as "unknowns," and reflect the vast unexplored microbial sequence space of our biosphere, also known as "biological dark matter." However, unknowns also exist because metagenomic datasets are not optimally mined. There is a pressure on researchers to publish and move on, and the unknown sequences are often left for what they are, and conclusions drawn based on reads with annotated homologs. This can cause abundant and widespread genomes to be overlooked, such as the recently discovered human gut bacteriophage crAssphage. The unknowns may be enriched for bacteriophage sequences, the most abundant and genetically diverse component of the biosphere and of sequence space. However, it remains an open question, what is the actual size of biological sequence space? The de novo assembly of shotgun metagenomes is the most powerful tool to address this question.

  16. The challenge of annotating protein sequences: The tale of eight domains of unknown function in Pfam.

    PubMed

    Goonesekere, Nalin C W; Shipely, Krysten; O'Connor, Kevin

    2010-06-01

    The Pfam database is an important tool in genome annotation, since it provides a collection of curated protein families. However, a subset of these families, known as domains of unknown function (DUFs), remains poorly characterized. We have related sequences from DUF404, DUF407, DUF482, DUF608, DUF810, DUF853, DUF976 and DUF1111 to homologs in PDB, within the midnight zone (9-20%) of sequence identity. These relationships were extended to provide functional annotation by sequence analysis and model building. Also described are examples of residue plasticity within enzyme active sites, and change of function within homologous sequences of a DUF. Copyright 2010 Elsevier Ltd. All rights reserved.

  17. Analysis of eight out genes in a cluster required for pectic enzyme secretion by Erwinia chrysanthemi: sequence comparison with secretion genes from other gram-negative bacteria.

    PubMed Central

    Lindeberg, M; Collmer, A

    1992-01-01

    Many extracellular proteins produced by Erwinia chrysanthemi require the out gene products for transport across the outer membrane. In a previous report (S. Y. He, M. Lindeberg, A. K. Chatterjee, and A. Collmer, Proc. Natl. Acad. Sci. USA 88:1079-1083, 1991) cosmid pCPP2006, sufficient for secretion of Erwinia chrysanthemi extracellular proteins by Escherichia coli, was partially sequenced, revealing four out genes sharing high homology with pulH through pulK from Klebsiella oxytoca. The nucleotide sequence of eight additional out genes reveals homology with pulC through pulG, pulL, pulM, pulO, and other genes involved in secretion by various gram-negative bacteria. Although signal sequences and hydrophobic regions are generally conserved between Pul and Out proteins, four out genes contain unique inserts, a pulN homolog is not present, and outO appears to be transcribed separately from outC through outM. The sequenced region was subcloned, and an additional 7.6-kb region upstream was identified as being required for secretion in E. coli. out gene homologs were found on Erwinia carotovora cosmid clone pAKC651 but were not detected in E. coli. The outC-through-outM operon is weakly induced by polygalacturonic acid and strongly expressed in the early stationary phase. The out and pul genes are highly similar in sequence, hydropathic properties, and overall arrangement but differ in both transcriptional organization and the nature of their induction. Images PMID:1429461

  18. H-2RIIBP, a member of the nuclear hormone receptor superfamily that binds to both the regulatory element of major histocompatibility class I genes and the estrogen response element.

    PubMed

    Hamada, K; Gleason, S L; Levi, B Z; Hirschfeld, S; Appella, E; Ozato, K

    1989-11-01

    Transcription of major histocompatibility complex (MHC) class I genes is regulated by the conserved MHC class I regulatory element (CRE). The CRE has two factor-binding sites, region I and region II, both of which elicit enhancer function. By screening a mouse lambda gt 11 library with the CRE as a probe, we isolated a cDNA clone that encodes a protein capable of binding to region II of the CRE. This protein, H-2RIIBP (H-2 region II binding protein), bound to the native region II sequence, but not to other MHC cis-acting sequences or to mutant region II sequences, similar to the naturally occurring region II factor in mouse cells. The deduced amino acid sequence of H-2RIIBP revealed two putative zinc fingers homologous to the DNA-binding domain of steroid/thyroid hormone receptors. Although sequence similarity in other regions was minimal, H-2RIIBP has apparent modular domains characteristic of the nuclear hormone receptors. Further analyses showed that both H-2RIIBP and the natural region II factor bind to the estrogen response element (ERE) of the vitellogenin A2 gene. The ERE is composed of a palindrome, and half of this palindrome resembles the region II binding site of the MHC CRE. These results indicate that H-2RIIBP (i) is a member of the superfamily of nuclear hormone receptors and (ii) may regulate not only MHC class I genes but also genes containing the ERE and related sequences. Sequences homologous to the H-2RIIBP gene are widely conserved in the animal kingdom. H-2RIIBP mRNA is expressed in many mouse tissues, in agreement with the distribution of the natural region II factor.

  19. Exploring the genome of the salt-marsh Spartina maritima (Poaceae, Chloridoideae) through BAC end sequence analysis.

    PubMed

    Ferreira de Carvalho, J; Chelaifa, H; Boutte, J; Poulain, J; Couloux, A; Wincker, P; Bellec, A; Fourment, J; Bergès, H; Salmon, A; Ainouche, M

    2013-12-01

    Spartina species play an important ecological role on salt marshes. Spartina maritima is an Old-World species distributed along the European and North-African Atlantic coasts. This hexaploid species (2n = 6x = 60, 2C = 3,700 Mb) hybridized with different Spartina species introduced from the American coasts, which resulted in the formation of new invasive hybrids and allopolyploids. Thus, S. maritima raises evolutionary and ecological interests. However, genomic information is dramatically lacking in this genus. In an effort to develop genomic resources, we analysed 40,641 high-quality bacterial artificial chromosome-end sequences (BESs), representing 26.7 Mb of the S. maritima genome. BESs were searched for sequence homology against known databases. A fraction of 16.91% of the BESs represents known repeats including a majority of long terminal repeat (LTR) retrotransposons (13.67%). Non-LTR retrotransposons represent 0.75%, DNA transposons 0.99%, whereas small RNA, simple repeats and low-complexity sequences account for 1.38% of the analysed BESs. In addition, 4,285 simple sequence repeats were detected. Using the coding sequence database of Sorghum bicolor, 6,809 BESs found homology accounting for 17.1% of all BESs. Comparative genomics with related genera reveals that the microsynteny is better conserved with S. bicolor compared to other sequenced Poaceae, where 37.6% of the paired matching BESs are correctly orientated on the chromosomes. We did not observe large macrosyntenic rearrangements using the mapping strategy employed. However, some regions appeared to have experienced rearrangements when comparing Spartina to Sorghum and to Oryza. This work represents the first overview of S. maritima genome regarding the respective coding and repetitive components. The syntenic relationships with other grass genomes examined here help clarifying evolution in Poaceae, S. maritima being a part of the poorly-known Chloridoideae sub-family.

  20. A CK2 site is reversibly phosphorylated in the photosystem II subunit CP29.

    PubMed

    Testi, M G; Croce, R; Polverino-De Laureto, P; Bassi, R

    1996-12-16

    Protein phosphorylation is a major mechanism in the regulation of protein function. In chloroplast thylakoids several photosystem II subunits, including the major antenna light-harvesting complex II and several core complex components, are reversibly phosphorylated depending on the redox state of the electron carriers. A previously unknown reversible phosphorylation event has recently been described on the CP29 subunit which leads to conformational changes and protection from cold stress (Bergantino, E., Dainese, P., Cerovic, Z. Sechi, S. and Bassi, R. (1995) J. Biol Chem. 270, 8474-8481). In this study, we have identified the phosphorylation site on the N-terminal, stroma-exposed domain, showing that it is located in a sequence not homologous to the other members of the Lhc family. The phosphorylated sequence is unique in chloroplast membranes since it meets the requirements for CK2 (casein kinase II) kinases. The possibility that this phosphorylation is involved in a signal transduction pathway is discussed.

  1. Tutorial on Protein Ontology Resources

    PubMed Central

    Arighi, Cecilia; Drabkin, Harold; Christie, Karen R.; Ross, Karen; Natale, Darren

    2017-01-01

    The Protein Ontology (PRO) is the reference ontology for proteins in the Open Biomedical Ontologies (OBO) foundry and consists of three sub-ontologies representing protein classes of homologous genes, proteoforms (e.g., splice isoforms, sequence variants, and post-translationally modified forms), and protein complexes. PRO defines classes of proteins and protein complexes, both species-specific and species non-specific, and indicates their relationships in a hierarchical framework, supporting accurate protein annotation at the appropriate level of granularity, analyses of protein conservation across species, and semantic reasoning. In this first section of this chapter, we describe the PRO framework including categories of PRO terms and the relationship of PRO to other ontologies and protein resources. Next, we provide a tutorial about the PRO website (proconsortium.org) where users can browse and search the PRO hierarchy, view reports on individual PRO terms, and visualize relationships among PRO terms in a hierarchical table view, a multiple sequence alignment view, and a Cytoscape network view. Finally, we describe several examples illustrating the unique and rich information available in PRO. PMID:28150233

  2. 3'-terminal sequence of a small round structured virus (SRSV) in Japan.

    PubMed

    Utagawa, E T; Takeda, N; Inouye, S; Kasuga, K; Yamazaki, S

    1994-01-01

    We determined the nucleotide sequence of about 1,000 bases from the 3'-terminus of a small round structured virus (SRSV), which caused a gastroenteritis outbreak in Chiba Prefecture, Japan, in 1987. The sequence was compared with the corresponding sequence region of Norwalk virus; it consisted of a part of the open reading frame 2 (ORF2), whole ORF3, and 3'-noncoding region (NCR). The 624-base-long ORF3 had sequence homology of 68% with the corresponding region of Norwalk virus. (The amino acid sequence homology was 74%.) The 94-base-long NCR had 65% homology with Norwalk virus. We then selected two consensus-sequence portions in the above sequence between Chiba and Norwalk viruses for primers in the reverse transcriptase-polymerase chain reaction (RT-PCR). Using this primer set, we detected 669-bp bands in agarose gel electrophoresis of RT-PCR products from feces containing Chiba or Norwalk viruses. Furthermore, in Southern hybridization with Chiba probes which were labeled with digoxigenin-dUTP in PCR, the bands of the two viruses were clearly stained under a low stringency condition. Since both Chiba and Norwalk viruses were detected by the above primer set although they are geographically and chronologically different viruses, our primer-pair may be useful for detection of a broad range of SRSVs which cause gastroenteritis in different areas.

  3. tRNAmodpred: a computational method for predicting posttranscriptional modifications in tRNAs

    PubMed Central

    Machnicka, Magdalena A.; Dunin-Horkawicz, Stanislaw; de Crécy-Lagard, Valerie; Bujnicki, Janusz M.

    2016-01-01

    tRNA molecules contain numerous chemically altered nucleosides, which are formed by enzymatic modification of the primary transcripts during the complex tRNA maturation process. Some of the modifications are introduced by single reactions, while other require complex series of reactions carried out by several different enzymes. The location and distribution of various types of modifications vary greatly between different tRNA molecules, organisms and organelles. We have developed a computational method tRNAmodpred, for predicting modifications in tRNA sequences. Briefly, our method takes as an input one or more unmodified tRNA sequences and a set of protein sequences corresponding to a proteome of a cell. Subsequently it identifies homologs of known tRNA modification enzymes in the proteome, predicts tRNA modification activities and maps them onto known pathways of RNA modification from the MODOMICS database. Thereby, theoretically possible modification pathways are identified, and products of these modification reactions are proposed for query tRNAs. This method allows for predicting modification patterns for newly sequenced genomes as well as for checking tentative modification status of tRNAs from one species treated with enzymes from another source, e.g. to predict the possible modifications of eukaryotic tRNAs expressed in bacteria. tRNAmodpred is freely available as web server at http://genesilico.pl/trnamodpred/. PMID:27016142

  4. Genetic stability of gene targeted immunoglobulin loci. I. Heavy chain isotype exchange induced by a universal gene replacement vector.

    PubMed Central

    Kardinal, C; Selmayr, M; Mocikat, R

    1996-01-01

    Gene targeting at the immunoglobulin loci of B cells is an efficient tool for studying immunoglobulin expression or generating chimeric antibodies. We have shown that vector integration induced by human immunoglobulin G1 (IgG1) insertion vectors results in subsequent vector excision mediated by the duplicated target sequence, whereas replacement events which could be induced by the same constructs remain stable. We could demonstrate that the distribution of the vector homology strongly influences the genetic stability obtained. To this end we developed a novel type of a heavy chain replacement vector making use of the heavy chain class switch recombination sequence. Despite the presence of a two-sided homology this construct is universally applicable irrespective of the constant gene region utilized by the B cell. In comparison to an integration vector the frequency of stable incorporation was strongly increased, but we still observed vector excision, although at a markedly reduced rate. The latter events even occurred with circular constructs. Linearization of the construct at various sites and the comparison with an integration vector that carries the identical homology sequence, but differs in the distribution of homology, revealed the following features of homologous recombination of immunoglobulin genes: (i) the integration frequency is only determined by the length of the homology flank where the cross-over takes place; (ii) a 5' flank that does not meet the minimum requirement of homology length cannot be complemented by a sufficient 3' flank; (iii) free vector ends play a role for integration as well as for replacement targeting; (iv) truncating recombination events are suppressed in the presence of two flanks. Furthermore, we show that the switch region that was used as 3' flank is non-functional in an inverted orientation. Images Figure 2 PMID:8958041

  5. Genetic stability of gene targeted immunoglobulin loci. I. Heavy chain isotype exchange induced by a universal gene replacement vector.

    PubMed

    Kardinal, C; Selmayr, M; Mocikat, R

    1996-11-01

    Gene targeting at the immunoglobulin loci of B cells is an efficient tool for studying immunoglobulin expression or generating chimeric antibodies. We have shown that vector integration induced by human immunoglobulin G1 (IgG1) insertion vectors results in subsequent vector excision mediated by the duplicated target sequence, whereas replacement events which could be induced by the same constructs remain stable. We could demonstrate that the distribution of the vector homology strongly influences the genetic stability obtained. To this end we developed a novel type of a heavy chain replacement vector making use of the heavy chain class switch recombination sequence. Despite the presence of a two-sided homology this construct is universally applicable irrespective of the constant gene region utilized by the B cell. In comparison to an integration vector the frequency of stable incorporation was strongly increased, but we still observed vector excision, although at a markedly reduced rate. The latter events even occurred with circular constructs. Linearization of the construct at various sites and the comparison with an integration vector that carries the identical homology sequence, but differs in the distribution of homology, revealed the following features of homologous recombination of immunoglobulin genes: (i) the integration frequency is only determined by the length of the homology flank where the cross-over takes place; (ii) a 5' flank that does not meet the minimum requirement of homology length cannot be complemented by a sufficient 3' flank; (iii) free vector ends play a role for integration as well as for replacement targeting; (iv) truncating recombination events are suppressed in the presence of two flanks. Furthermore, we show that the switch region that was used as 3' flank is non-functional in an inverted orientation.

  6. Phylogenetic Analysis of Myobia musculi (Schranck, 1781) by Using the 18S Small Ribosomal Subunit Sequence

    PubMed Central

    Feldman, Sanford H; Ntenda, Abraham M

    2011-01-01

    We used high-fidelity PCR to amplify 2 overlapping regions of the ribosomal gene complex from the rodent fur mite Myobia musculi. The amplicons encompassed a large portion of the mite's ribosomal gene complex spanning 3128 nucleotides containing the entire 18S rRNA, internal transcribed spacer (ITS) 1, 5.8S rRNA, ITS2, and a portion of the 5′-end of the 28S rRNA. M. musculi’s 179-nucleotide 5.8S rRNA nucleotide sequence was not conserved, so this region was identified by conservation of rRNA secondary structure. Maximum likelihood and Bayesian inference phylogenetic analyses were performed by using multiple sequence alignment consisting of 1524 nucleotides of M. musculi 18S rRNA and homologous sequences from 42 prostigmatid mites and the tick Dermacentor andersoni. The phylograms produced by both methods were in agreement regarding terminal, secondary, and some tertiary phylogenetic relationships among mites. Bayesian inference discriminated most infraordinal relationships between Eleutherengona and Parasitengona mites in the suborder Anystina. Basal relationships between suborders Anystina and Eupodina historically determined by comparing differences in anatomic characteristics were less well-supported by our molecular analysis. Our results recapitulated similar 18S rRNA sequence analyses recently reported. Our study supports M. musculi as belonging to the suborder Anystina, infraorder Eleutherenona, and superfamily Cheyletoidea. PMID:22330574

  7. Novel molecular approach to define pest species status and tritrophic interactions from historical Bemisia specimens.

    PubMed

    Tay, W T; Elfekih, S; Polaszek, A; Court, L N; Evans, G A; Gordon, K H J; De Barro, P J

    2017-03-27

    Museum specimens represent valuable genomic resources for understanding host-endosymbiont/parasitoid evolutionary relationships, resolving species complexes and nomenclatural problems. However, museum collections suffer DNA degradation, making them challenging for molecular-based studies. Here, the mitogenomes of a single 1912 Sri Lankan Bemisia emiliae cotype puparium, and of a 1942 Japanese Bemisia puparium are characterised using a Next-Generation Sequencing approach. Whiteflies are small sap-sucking insects including B. tabaci pest species complex. Bemisia emiliae's draft mitogenome showed a high degree of homology with published B. tabaci mitogenomes, and exhibited 98-100% partial mitochondrial DNA Cytochrome Oxidase I (mtCOI) gene identity with the B. tabaci species known as Asia II-7. The partial mtCOI gene of the Japanese specimen shared 99% sequence identity with the Bemisia 'JpL' genetic group. Metagenomic analysis identified bacterial sequences in both Bemisia specimens, while hymenopteran sequences were also identified in the Japanese Bemisia puparium, including complete mtCOI and rRNA genes, and various partial mtDNA genes. At 88-90% mtCOI sequence identity to Aphelinidae wasps, we concluded that the 1942 Bemisia nymph was parasitized by an Eretmocerus parasitoid wasp. Our approach enables the characterisation of genomes and associated metagenomic communities of museum specimens using 1.5 ng gDNA, and to infer historical tritrophic relationships in Bemisia whiteflies.

  8. Are Fireworms Venomous? Evidence for the Convergent Evolution of Toxin Homologs in Three Species of Fireworms (Annelida, Amphinomidae)

    PubMed Central

    Simpson, Danny

    2018-01-01

    Abstract Amphinomids, more commonly known as fireworms, are a basal lineage of marine annelids characterized by the presence of defensive dorsal calcareous chaetae, which break off upon contact. It has long been hypothesized that amphinomids are venomous and use the chaetae to inject a toxic substance. However, studies investigating fireworm venom from a morphological or molecular perspective are scarce and no venom gland has been identified to date, nor any toxin characterized at the molecular level. To investigate this question, we analyzed the transcriptomes of three species of fireworms—Eurythoe complanata, Hermodice carunculata, and Paramphinome jeffreysii—following a venomics approach to identify putative venom compounds. Our venomics pipeline involved de novo transcriptome assembly, open reading frame, and signal sequence prediction, followed by three different homology search strategies: BLAST, HMMER sequence, and HMMER domain. Following this pipeline, we identified 34 clusters of orthologous genes, representing 13 known toxin classes that have been repeatedly recruited into animal venoms. Specifically, the three species share a similar toxin profile with C-type lectins, peptidases, metalloproteinases, spider toxins, and CAP proteins found among the most highly expressed toxin homologs. Despite their great diversity, the putative toxins identified are predominantly involved in three major biological processes: hemostasis, inflammatory response, and allergic reactions, all of which are commonly disrupted after fireworm stings. Although the putative fireworm toxins identified here need to be further validated, our results strongly suggest that fireworms are venomous animals that use a complex mixture of toxins for defense against predators. PMID:29293976

  9. DNA Strand Exchange and RecA Homologs in Meiosis

    PubMed Central

    Brown, M. Scott; Bishop, Douglas K.

    2015-01-01

    Homology search and DNA strand–exchange reactions are central to homologous recombination in meiosis. During meiosis, these processes are regulated such that the probability of choosing a homolog chromatid as recombination partner is enhanced relative to that of choosing a sister chromatid. This regulatory process occurs as homologous chromosomes pair in preparation for assembly of the synaptonemal complex. Two strand–exchange proteins, Rad51 and Dmc1, cooperate in regulated homology search and strand exchange in most organisms. Here, we summarize studies on the properties of these two proteins and their accessory factors. In addition, we review current models for the assembly of meiotic strand–exchange complexes and the possible mechanisms through which the interhomolog bias of recombination partner choice is achieved. PMID:25475089

  10. Identification and Characterization of Putative Integron-Like Elements of the Heavy-Metal-Hypertolerant Strains of Pseudomonas spp.

    PubMed

    Ciok, Anna; Adamczuk, Marcin; Bartosik, Dariusz; Dziewit, Lukasz

    2016-11-28

    Pseudomonas strains isolated from the heavily contaminated Lubin copper mine and Zelazny Most post-flotation waste reservoir in Poland were screened for the presence of integrons. This analysis revealed that two strains carried homologous DNA regions composed of a gene encoding a DNA_BRE_C domain-containing tyrosine recombinase (with no significant sequence similarity to other integrases of integrons) plus a three-component array of putative integron gene cassettes. The predicted gene cassettes encode three putative polypeptides with homology to (i) transmembrane proteins, (ii) GCN5 family acetyltransferases, and (iii) hypothetical proteins of unknown function (homologous proteins are encoded by the gene cassettes of several class 1 integrons). Comparative sequence analyses identified three structural variants of these novel integron-like elements within the sequenced bacterial genomes. Analysis of their distribution revealed that they are found exclusively in strains of the genus Pseudomonas .

  11. GPU-Acceleration of Sequence Homology Searches with Database Subsequence Clustering.

    PubMed

    Suzuki, Shuji; Kakuta, Masanori; Ishida, Takashi; Akiyama, Yutaka

    2016-01-01

    Sequence homology searches are used in various fields and require large amounts of computation time, especially for metagenomic analysis, owing to the large number of queries and the database size. To accelerate computing analyses, graphics processing units (GPUs) are widely used as a low-cost, high-performance computing platform. Therefore, we mapped the time-consuming steps involved in GHOSTZ, which is a state-of-the-art homology search algorithm for protein sequences, onto a GPU and implemented it as GHOSTZ-GPU. In addition, we optimized memory access for GPU calculations and for communication between the CPU and GPU. As per results of the evaluation test involving metagenomic data, GHOSTZ-GPU with 12 CPU threads and 1 GPU was approximately 3.0- to 4.1-fold faster than GHOSTZ with 12 CPU threads. Moreover, GHOSTZ-GPU with 12 CPU threads and 3 GPUs was approximately 5.8- to 7.7-fold faster than GHOSTZ with 12 CPU threads.

  12. Cloning and nucleotide sequence of the Pseudomonas aeruginosa glucose-selective OprB porin gene and distribution of OprB within the family Pseudomonadaceae.

    PubMed

    Wylie, J L; Worobec, E A

    1994-03-01

    OprB is a glucose-selective porin known to be produced by Pseudomonas aeruginosa and Pseudomonas putida. We have cloned and sequenced the oprB gene of P. aeruginosa and obtained expression of OprB in Escherichia coli. The mature protein consists of 423 amino acid residues with a deduced molecular mass of 47597 Da. Several clusters of amino acid residues, potentially involved in the structure or function of the protein, were identified. An area of regional homology with E. coli LamB was also identified. Carbohydrate-inducible proteins, potentially homologous to OprB, were identified in several rRNA homology-group-I pseudomonads by sodium dodecyl sulfate/polyacrylamide gel electrophoresis analysis, Western immunoblotting and N-terminal amino acid sequencing. These species also contained DNA that hybridized to a P. aeruginosa oprB gene probe.

  13. ProtDec-LTR2.0: an improved method for protein remote homology detection by combining pseudo protein and supervised Learning to Rank.

    PubMed

    Chen, Junjie; Guo, Mingyue; Li, Shumin; Liu, Bin

    2017-11-01

    As one of the most important tasks in protein sequence analysis, protein remote homology detection is critical for both basic research and practical applications. Here, we present an effective web server for protein remote homology detection called ProtDec-LTR2.0 by combining ProtDec-Learning to Rank (LTR) and pseudo protein representation. Experimental results showed that the detection performance is obviously improved. The web server provides a user-friendly interface to explore the sequence and structure information of candidate proteins and find their conserved domains by launching a multiple sequence alignment tool. The web server is free and open to all users with no login requirement at http://bioinformatics.hitsz.edu.cn/ProtDec-LTR2.0/. bliu@hit.edu.cn. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  14. Structural and functional characterization of a cell cycle associated HDAC1/2 complex reveals the structural basis for complex assembly and nucleosome targeting

    PubMed Central

    Itoh, Toshimasa; Fairall, Louise; Muskett, Frederick W.; Milano, Charles P.; Watson, Peter J.; Arnaudo, Nadia; Saleh, Almutasem; Millard, Christopher J.; El-Mezgueldi, Mohammed; Martino, Fabrizio; Schwabe, John W.R.

    2015-01-01

    Recent proteomic studies have identified a novel histone deacetylase complex that is upregulated during mitosis and is associated with cyclin A. This complex is conserved from nematodes to man and contains histone deacetylases 1 and 2, the MIDEAS corepressor protein and a protein called DNTTIP1 whose function was hitherto poorly understood. Here, we report the structures of two domains from DNTTIP1. The amino-terminal region forms a tight dimerization domain with a novel structural fold that interacts with and mediates assembly of the HDAC1:MIDEAS complex. The carboxy-terminal domain of DNTTIP1 has a structure related to the SKI/SNO/DAC domain, despite lacking obvious sequence homology. We show that this domain in DNTTIP1 mediates interaction with both DNA and nucleosomes. Thus, DNTTIP1 acts as a dimeric chromatin binding module in the HDAC1:MIDEAS corepressor complex. PMID:25653165

  15. Alignment of Common Wheat and Other Grass Genomes Establishes a Comparative Genomics Research Platform

    PubMed Central

    Sun, Sangrong; Wang, Jinpeng; Yu, Jigao; Meng, Fanbo; Xia, Ruiyan; Wang, Li; Wang, Zhenyi; Ge, Weina; Liu, Xiaojian; Li, Yuxian; Liu, Yinzhe; Yang, Nanshan; Wang, Xiyin

    2017-01-01

    Grass genomes are complicated structures as they share a common tetraploidization, and particular genomes have been further affected by extra polyploidizations. These events and the following genomic re-patternings have resulted in a complex, interweaving gene homology both within a genome, and between genomes. Accurately deciphering the structure of these complicated plant genomes would help us better understand their compositional and functional evolution at multiple scales. Here, we build on our previous research by performing a hierarchical alignment of the common wheat genome vis-à-vis eight other sequenced grass genomes with most up-to-date assemblies, and annotations. With this data, we constructed a list of the homologous genes, and then, in a layer-by-layer process, separated their orthology, and paralogy that were established by speciations and recursive polyploidizations, respectively. Compared with the other grasses, the far fewer collinear outparalogous genes within each of three subgenomes of common wheat suggest that homoeologous recombination, and genomic fractionation should have occurred after its formation. In sum, this work contributes to the establishment of an important and timely comparative genomics platform for researchers in the grass community and possibly beyond. Homologous gene list can be found in Supplemental material. PMID:28912789

  16. Spiroplasma species share common DNA sequences among their viruses, plasmids and genomes.

    PubMed

    Ranhand, J M; Nur, I; Rose, D L; Tully, J G

    1987-01-01

    Alkaline-Southern-blot analyses showed that a spiroplasma plasmid, pRA1, obtained from Spiroplasma citri (Maroc-R8A2), contained DNA sequences that were homologous to spiroplasma type 3 viruses (SV3) obtained from S. citri (Maroc-R8A2), S. citri (608) and S. mirum (SMCA). In addition, pRA1 and SV3(608) DNA shared common, but not necessarily related, sequences with extrachromosomal DNA derived from 11 Spiroplasma species or strains. Furthermore, SV3(608) had DNA homology with the chromosome from 6 distinct spiroplasmas but not with chromosomal DNA from eight other Spiroplasma species or strains. The biological function of these common sequences is unknown.

  17. Weak conservation of structural features in the interfaces of homologous transient protein–protein complexes

    PubMed Central

    Sudha, Govindarajan; Singh, Prashant; Swapna, Lakshmipuram S; Srinivasan, Narayanaswamy

    2015-01-01

    Residue types at the interface of protein–protein complexes (PPCs) are known to be reasonably well conserved. However, we show, using a dataset of known 3-D structures of homologous transient PPCs, that the 3-D location of interfacial residues and their interaction patterns are only moderately and poorly conserved, respectively. Another surprising observation is that a residue at the interface that is conserved is not necessarily in the interface in the homolog. Such differences in homologous complexes are manifested by substitution of the residues that are spatially proximal to the conserved residue and structural differences at the interfaces as well as differences in spatial orientations of the interacting proteins. Conservation of interface location and the interaction pattern at the core of the interfaces is higher than at the periphery of the interface patch. Extents of variability of various structural features reported here for homologous transient PPCs are higher than the variation in homologous permanent homomers. Our findings suggest that straightforward extrapolation of interfacial nature and inter-residue interaction patterns from template to target could lead to serious errors in the modeled complex structure. Understanding the evolution of interfaces provides insights to improve comparative modeling of PPC structures. PMID:26311309

  18. Comparative genomic de-convolution of the cotton genome revealed a decaploid ancestor and widespread chromosomal fractionation.

    PubMed

    Wang, Xiyin; Guo, Hui; Wang, Jinpeng; Lei, Tianyu; Liu, Tao; Wang, Zhenyi; Li, Yuxian; Lee, Tae-Ho; Li, Jingping; Tang, Haibao; Jin, Dianchuan; Paterson, Andrew H

    2016-02-01

    The 'apparently' simple genomes of many angiosperms mask complex evolutionary histories. The reference genome sequence for cotton (Gossypium spp.) revealed a ploidy change of a complexity unprecedented to date, indeed that could not be distinguished as to its exact dosage. Herein, by developing several comparative, computational and statistical approaches, we revealed a 5× multiplication in the cotton lineage of an ancestral genome common to cotton and cacao, and proposed evolutionary models to show how such a decaploid ancestor formed. The c. 70% gene loss necessary to bring the ancestral decaploid to its current gene count appears to fit an approximate geometrical model; that is, although many genes may be lost by single-gene deletion events, some may be lost in groups of consecutive genes. Gene loss following cotton decaploidy has largely just reduced gene copy numbers of some homologous groups. We designed a novel approach to deconvolute layers of chromosome homology, providing definitive information on gene orthology and paralogy across broad evolutionary distances, both of fundamental value and serving as an important platform to support further studies in and beyond cotton and genomics communities. No claim to original US government works. New Phytologist © 2015 New Phytologist Trust.

  19. Structural and functional analyses of DM43, a snake venom metalloproteinase inhibitor from Didelphis marsupialis serum.

    PubMed

    Neves-Ferreira, Ana G C; Perales, Jonas; Fox, Jay W; Shannon, John D; Makino, Débora L; Garratt, Richard C; Domont, Gilberto B

    2002-04-12

    DM43, an opossum serum protein inhibitor of snake venom metalloproteinases, has been completely sequenced, and its disulfide bond pattern has been experimentally determined. It shows homology to human alpha(1)B-glycoprotein, a plasma protein of unknown function and a member of the immunoglobulin supergene family. Size exclusion and dynamic laser light scattering data indicated that two monomers of DM43, each composed of three immunoglobulin-like domains, associated to form a homodimer in solution. Analysis of its glycan moiety showed the presence of N-acetylglucosamine, mannose, galactose, and sialic acid, most probably forming four biantennary N-linked chains. DM43 inhibited the fibrinogenolytic activities of bothrolysin and jararhagin and formed 1:1 stoichiometric stable complexes with both metalloproteinases. DM43 was ineffective against atrolysin C or A. No complex formation was detected between DM43 and jararhagin C, indicating the essential role of the metalloproteinase domain for interaction. Homology modeling based on the crystal structure of a killer cell inhibitory receptor suggested the existence of an I-type Ig fold, a hydrophobic dimerization surface and six surface loops potentially forming the metalloproteinase-binding surface on DM43.

  20. CBH1 homologs and varian CBH1 cellulase

    DOEpatents

    Goedegebuur, Frits; Gualfetti, Peter; Mitchinson, Colin; Neefe, Paulien

    2014-07-01

    Disclosed are a number of homologs and variants of Hypocrea jecorina Cel7A (formerly Trichoderma reesei cellobiohydrolase I or CBH1), nucleic acids encoding the same and methods for producing the same. The homologs and variant cellulases have the amino acid sequence of a glycosyl hydrolase of family 7A wherein one or more amino acid residues are substituted and/or deleted.

  1. CBH1 homologs and variant CBH1 cellulases

    DOEpatents

    Goedegebuur, Frits [Rozenlaan, NL; Gualfetti, Peter [San Francisco, CA; Mitchinson, Colin [Half Moon Bay, CA; Neefe, Paulien [Zoetermeer, NL

    2011-05-31

    Disclosed are a number of homologs and variants of Hypocrea jecorina Cel7A (formerly Trichoderma reesei cellobiohydrolase I or CBH1), nucleic acids encoding the same and methods for producing the same. The homologs and variant cellulases have the amino acid sequence of a glycosyl hydrolase of family 7A wherein one or more amino acid residues are substituted and/or deleted.

  2. Occurrence and expression of gene transfer agent genes in marine bacterioplankton.

    PubMed

    Biers, Erin J; Wang, Kui; Pennington, Catherine; Belas, Robert; Chen, Feng; Moran, Mary Ann

    2008-05-01

    Genes with homology to the transduction-like gene transfer agent (GTA) were observed in genome sequences of three cultured members of the marine Roseobacter clade. A broader search for homologs for this host-controlled virus-like gene transfer system identified likely GTA systems in cultured Alphaproteobacteria, and particularly in marine bacterioplankton representatives. Expression of GTA genes and extracellular release of GTA particles ( approximately 50 to 70 nm) was demonstrated experimentally for the Roseobacter clade member Silicibacter pomeroyi DSS-3, and intraspecific gene transfer was documented. GTA homologs are surprisingly infrequent in marine metagenomic sequence data, however, and the role of this lateral gene transfer mechanism in ocean bacterioplankton communities remains unclear.

  3. Occurrence and Expression of Gene Transfer Agent Genes in Marine Bacterioplankton▿

    PubMed Central

    Biers, Erin J.; Wang, Kui; Pennington, Catherine; Belas, Robert; Chen, Feng; Moran, Mary Ann

    2008-01-01

    Genes with homology to the transduction-like gene transfer agent (GTA) were observed in genome sequences of three cultured members of the marine Roseobacter clade. A broader search for homologs for this host-controlled virus-like gene transfer system identified likely GTA systems in cultured Alphaproteobacteria, and particularly in marine bacterioplankton representatives. Expression of GTA genes and extracellular release of GTA particles (∼50 to 70 nm) was demonstrated experimentally for the Roseobacter clade member Silicibacter pomeroyi DSS-3, and intraspecific gene transfer was documented. GTA homologs are surprisingly infrequent in marine metagenomic sequence data, however, and the role of this lateral gene transfer mechanism in ocean bacterioplankton communities remains unclear. PMID:18359833

  4. Nucleotide sequence of the ribosomal RNA gene of Physarum polycephalum: intron 2 and its flanking regions of the 26S rRNA gene.

    PubMed Central

    Nomiyama, H; Kuhara, S; Kukita, T; Otsuka, T; Sakaki, Y

    1981-01-01

    The 26S ribosomal RNA gene of Physarum polycephalum is interrupted by two introns, and we have previously determined the sequence of one of them (intron 1) (Nomiyama et al. Proc.Natl.Acad.Sci.USA 78, 1376-1380, 1981). In this study we sequenced the second intron (intron 2) of about 0.5 kb length and its flanking regions, and found that one nucleotide at each junction is identical in intron 1 and intron 2, though the junction regions share no other sequence homology. Comparison of the flanking exon sequences to E. coli 23S rRNA sequences shows that conserved sequences are interspersed with tracts having little homology. In particular, the region encompassing the intron 2 interruption site is highly conserved. The E. coli ribosomal protein L1 binding region is also conserved. Images PMID:6171776

  5. Cloning and Sequencing of Defective Particles Derived from the Autonomous Parvovirus Minute Virus of Mice for the Construction of Vectors with Minimal cis-Acting Sequences

    PubMed Central

    Clément, Nathalie; Avalosse, Bernard; El Bakkouri, Karim; Velu, Thierry; Brandenburger, Annick

    2001-01-01

    The production of wild-type-free stocks of recombinant parvovirus minute virus of mice [MVM(p)] is difficult due to the presence of homologous sequences in vector and helper genomes that cannot easily be eliminated from the overlapping coding sequences. We have therefore cloned and sequenced spontaneously occurring defective particles of MVM(p) with very small genomes to identify the minimal cis-acting sequences required for DNA amplification and virus production. One of them has lost all capsid-coding sequences but is still able to replicate in permissive cells when nonstructural proteins are provided in trans by a helper plasmid. Vectors derived from this particle produce stocks with no detectable wild-type MVM after cotransfection with new, matched, helper plasmids that present no homology downstream from the transgene. PMID:11152501

  6. Cloning and sequence analysis of complementary DNA encoding an aberrantly rearranged human T-cell gamma chain.

    PubMed Central

    Dialynas, D P; Murre, C; Quertermous, T; Boss, J M; Leiden, J M; Seidman, J G; Strominger, J L

    1986-01-01

    Complementary DNA (cDNA) encoding a human T-cell gamma chain has been cloned and sequenced. At the junction of the variable and joining regions, there is an apparent deletion of two nucleotides in the human cDNA sequence relative to the murine gamma-chain cDNA sequence, resulting simultaneously in the generation of an in-frame stop codon and in a translational frameshift. For this reason, the sequence presented here encodes an aberrantly rearranged human T-cell gamma chain. There are several surprising differences between the deduced human and murine gamma-chain amino acid sequences. These include poor homology in the variable region, poor homology in a discrete segment of the constant region precisely bounded by the expected junctions of exon CII, and the presence in the human sequence of five potential sites for N-linked glycosylation. Images PMID:3458221

  7. Nucleotide sequence determination of guinea-pig casein B mRNA reveals homology with bovine and rat alpha s1 caseins and conservation of the non-coding regions of the mRNA.

    PubMed Central

    Hall, L; Laird, J E; Craig, R K

    1984-01-01

    Nucleotide sequence analysis of cloned guinea-pig casein B cDNA sequences has identified two casein B variants related to the bovine and rat alpha s1 caseins. Amino acid homology was largely confined to the known bovine or predicted rat phosphorylation sites and within the 'signal' precursor sequence. Comparison of the deduced nucleotide sequence of the guinea-pig and rat alpha s1 casein mRNA species showed greater sequence conservation in the non-coding than in the coding regions, suggesting a functional and possibly regulatory role for the non-coding regions of casein mRNA. The results provide insight into the evolution of the casein genes, and raise questions as to the role of conserved nucleotide sequences within the non-coding regions of mRNA species. Images Fig. 1. PMID:6548375

  8. Binding of the Ras activator son of sevenless to insulin receptor substrate-1 signaling complexes.

    PubMed

    Baltensperger, K; Kozma, L M; Cherniack, A D; Klarlund, J K; Chawla, A; Banerjee, U; Czech, M P

    1993-06-25

    Signal transmission by insulin involves tyrosine phosphorylation of a major insulin receptor substrate (IRS-1) and exchange of Ras-bound guanosine diphosphate for guanosine triphosphate. Proteins containing Src homology 2 and 3 (SH2 and SH3) domains, such as the p85 regulatory subunit of phosphatidylinositol-3 kinase and growth factor receptor-bound protein 2 (GRB2), bind tyrosine phosphate sites on IRS-1 through their SH2 regions. Such complexes in COS cells were found to contain the heterologously expressed putative guanine nucleotide exchange factor encoded by the Drosophila son of sevenless gene (dSos). Thus, GRB2, p85, or other proteins with SH2-SH3 adapter sequences may link Sos proteins to IRS-1 signaling complexes as part of the mechanism by which insulin activates Ras.

  9. DArT Markers Effectively Target Gene Space in the Rye Genome

    PubMed Central

    Gawroński, Piotr; Pawełkowicz, Magdalena; Tofil, Katarzyna; Uszyński, Grzegorz; Sharifova, Saida; Ahluwalia, Shivaksh; Tyrka, Mirosław; Wędzony, Maria; Kilian, Andrzej; Bolibok-Brągoszewska, Hanna

    2016-01-01

    Large genome size and complexity hamper considerably the genomics research in relevant species. Rye (Secale cereale L.) has one of the largest genomes among cereal crops and repetitive sequences account for over 90% of its length. Diversity Arrays Technology is a high-throughput genotyping method, in which a preferential sampling of gene-rich regions is achieved through the use of methylation sensitive restriction enzymes. We obtained sequences of 6,177 rye DArT markers and following a redundancy analysis assembled them into 3,737 non-redundant sequences, which were then used in homology searches against five Pooideae sequence sets. In total 515 DArT sequences could be incorporated into publicly available rye genome zippers providing a starting point for the integration of DArT- and transcript-based genomics resources in rye. Using Blast2Go pipeline we attributed putative gene functions to 1101 (29.4%) of the non-redundant DArT marker sequences, including 132 sequences with putative disease resistance-related functions, which were found to be preferentially located in the 4RL and 6RL chromosomes. Comparative analysis based on the DArT sequences revealed obvious inconsistencies between two recently published high density consensus maps of rye. Furthermore we demonstrated that DArT marker sequences can be a source of SSR polymorphisms. Obtained data demonstrate that DArT markers effectively target gene space in the large, complex, and repetitive rye genome. Through the annotation of putative gene functions and the alignment of DArT sequences relative to reference genomes we obtained information, that will complement the results of the studies, where DArT genotyping was deployed, by simplifying the gene ontology and microcolinearity based identification of candidate genes. PMID:27833625

  10. DArT Markers Effectively Target Gene Space in the Rye Genome.

    PubMed

    Gawroński, Piotr; Pawełkowicz, Magdalena; Tofil, Katarzyna; Uszyński, Grzegorz; Sharifova, Saida; Ahluwalia, Shivaksh; Tyrka, Mirosław; Wędzony, Maria; Kilian, Andrzej; Bolibok-Brągoszewska, Hanna

    2016-01-01

    Large genome size and complexity hamper considerably the genomics research in relevant species. Rye ( Secale cereale L.) has one of the largest genomes among cereal crops and repetitive sequences account for over 90% of its length. Diversity Arrays Technology is a high-throughput genotyping method, in which a preferential sampling of gene-rich regions is achieved through the use of methylation sensitive restriction enzymes. We obtained sequences of 6,177 rye DArT markers and following a redundancy analysis assembled them into 3,737 non-redundant sequences, which were then used in homology searches against five Pooideae sequence sets. In total 515 DArT sequences could be incorporated into publicly available rye genome zippers providing a starting point for the integration of DArT- and transcript-based genomics resources in rye. Using Blast2Go pipeline we attributed putative gene functions to 1101 (29.4%) of the non-redundant DArT marker sequences, including 132 sequences with putative disease resistance-related functions, which were found to be preferentially located in the 4RL and 6RL chromosomes. Comparative analysis based on the DArT sequences revealed obvious inconsistencies between two recently published high density consensus maps of rye. Furthermore we demonstrated that DArT marker sequences can be a source of SSR polymorphisms. Obtained data demonstrate that DArT markers effectively target gene space in the large, complex, and repetitive rye genome. Through the annotation of putative gene functions and the alignment of DArT sequences relative to reference genomes we obtained information, that will complement the results of the studies, where DArT genotyping was deployed, by simplifying the gene ontology and microcolinearity based identification of candidate genes.

  11. Fos metamorphoses: Lessons from mutants in model organisms (Drosophila).

    PubMed

    Alfonso-Gonzalez, Carlos; Riesgo-Escovar, Juan Rafael

    2018-05-10

    The Fos oncogene gene family is evolutionarily conserved throughout Eukarya. Fos proteins characteristically have a leucine zipper and a basic region with a helix-turn-helix motif that binds DNA. In vertebrates, there are several Fos homologs. They can homo- or hetero-dimerize via the leucine zipper domain. Fos homologs coupled with other transcription factors, like Jun oncoproteins, constitute the Activator Protein 1 (AP-1) complex. From its original inception as an oncogene, the subsequent finding that they act as transcription factors binding DNA sequences known as TRE, to the realization that they are activated in many different scenarios, and to loss-of-function analysis, the Fos proteins have traversed a multifarious path in development and physiology. They are instrumental in 'immediate early genes' responses, and activated by a seemingly myriad assemblage of different stimuli. Yet, the majority of these studies were basically gain-of-function studies, since it was thought that Fos genes would be cell lethal. Loss-of-function mutations in vertebrates were recovered later, and were not cell lethal. In fact, c-fos null mutations are viable with developmental defects (osteopetrosis and myeloid lineage abnormalities). It was then hypothesized that vertebrate genomes exhibit partial redundancy, explaining the 'mild' phenotypes, and complicating assessment of complete loss-of-function phenotypes. Due to its promiscuous activation, fos genes (especially c-fos) are now commonly used as markers for cellular responses to stimuli. fos homologs high sequence conservation (including Drosophila) is advantageous as it allows critical assessment of fos genes functions in this genetic model. Drosophila melanogaster contains only one fos homolog, the gene kayak. kayak mutations are lethal, and allow study of all the processes where fos is required. The kayak locus encodes several different isoforms, and is a pleiotropic gene variously required for development involving cell shape changes. In general, fos genes seem to primarily activate programs involved in cellular architectural rearrangements and cell shape changes. Copyright © 2018. Published by Elsevier B.V.

  12. Detection of Helicobacter and Campylobacter spp. from the aquatic environment of marine mammals.

    PubMed

    Goldman, C G; Matteo, M J; Loureiro, J D; Degrossi, J; Teves, S; Heredia, S Rodriguez; Alvarez, K; González, A Beltrán; Catalano, M; Boccio, J; Cremaschi, G; Solnick, J V; Zubillaga, M B

    2009-01-13

    The mechanism by which Helicobacter species are transmitted remains unclear. To examine the possible role of environmental transmission in marine mammals, we sought the presence of Helicobacter spp. and non-Helicobacter bacteria within the order Campylobacterales in water from the aquatic environment of marine mammals, and in fish otoliths regurgitated by dolphins. Water was collected from six pools, two inhabited by dolphins and four inhabited by seals. Regurgitated otoliths were collected from the bottom of dolphins' pools. Samples were evaluated by culture, PCR and DNA sequence analysis. Sequences from dolphins' water and from regurgitated otoliths clustered with 99.8-100% homology with sequences from gastric fluids, dental plaque and saliva from dolphins living in those pools, and with 99.5% homology with H. cetorum. Sequences from seals' water clustered with 99.5% homology with a sequence amplified from a Northern sea lion (AY203900). Control PCR on source water for the pools and from otoliths dissected from feeder fish were negative. The findings of Helicobacter spp. DNA in the aquatic environment suggests that contaminated water from regurgitated fish otoliths and perhaps other tissues may play a role in Helicobacter transmission among marine mammals.

  13. Isolation and characterization of a cDNA clone for the complete protein coding region of the delta subunit of the mouse acetylcholine receptor.

    PubMed Central

    LaPolla, R J; Mayne, K M; Davidson, N

    1984-01-01

    A mouse cDNA clone has been isolated that contains the complete coding region of a protein highly homologous to the delta subunit of the Torpedo acetylcholine receptor (AcChoR). The cDNA library was constructed in the vector lambda 10 from membrane-associated poly(A)+ RNA from BC3H-1 mouse cells. Surprisingly, the delta clone was selected by hybridization with cDNA encoding the gamma subunit of the Torpedo AcChoR. The nucleotide sequence of the mouse cDNA clone contains an open reading frame of 520 amino acids. This amino acid sequence exhibits 59% and 50% sequence homology to the Torpedo AcChoR delta and gamma subunits, respectively. However, the mouse nucleotide sequence has several stretches of high homology with the Torpedo gamma subunit cDNA, but not with delta. The mouse protein has the same general structural features as do the Torpedo subunits. It is encoded by a 3.3-kilobase mRNA. There is probably only one, but at most two, chromosomal genes coding for this or closely related sequences. Images PMID:6096870

  14. Manipulation of Karyotype in Caenorhabditis elegans Reveals Multiple Inputs Driving Pairwise Chromosome Synapsis During Meiosis

    PubMed Central

    Roelens, Baptiste; Schvarzstein, Mara; Villeneuve, Anne M.

    2015-01-01

    Meiotic chromosome segregation requires pairwise association between homologs, stabilized by the synaptonemal complex (SC). Here, we investigate factors contributing to pairwise synapsis by investigating meiosis in polyploid worms. We devised a strategy, based on transient inhibition of cohesin function, to generate polyploid derivatives of virtually any Caenorhabditis elegans strain. We exploited this strategy to investigate the contribution of recombination to pairwise synapsis in tetraploid and triploid worms. In otherwise wild-type polyploids, chromosomes first sort into homolog groups, then multipartner interactions mature into exclusive pairwise associations. Pairwise synapsis associations still form in recombination-deficient tetraploids, confirming a propensity for synapsis to occur in a strictly pairwise manner. However, the transition from multipartner to pairwise association was perturbed in recombination-deficient triploids, implying a role for recombination in promoting this transition when three partners compete for synapsis. To evaluate the basis of synapsis partner preference, we generated polyploid worms heterozygous for normal sequence and rearranged chromosomes sharing the same pairing center (PC). Tetraploid worms had no detectable preference for identical partners, indicating that PC-adjacent homology drives partner choice in this context. In contrast, triploid worms exhibited a clear preference for identical partners, indicating that homology outside the PC region can influence partner choice. Together, our findings, suggest a two-phase model for C. elegans synapsis: an early phase, in which initial synapsis interactions are driven primarily by recombination-independent assessment of homology near PCs and by a propensity for pairwise SC assembly, and a later phase in which mature synaptic interactions are promoted by recombination. PMID:26500263

  15. Chromosome specific repetitive DNA sequences

    DOEpatents

    Moyzis, Robert K.; Meyne, Julianne

    1991-01-01

    A method is provided for determining specific nucleotide sequences useful in forming a probe which can identify specific chromosomes, preferably through in situ hybridization within the cell itself. In one embodiment, chromosome preferential nucleotide sequences are first determined from a library of recombinant DNA clones having families of repetitive sequences. Library clones are identified with a low homology with a sequence of repetitive DNA families to which the first clones respectively belong and variant sequences are then identified by selecting clones having a pattern of hybridization with genomic DNA dissimilar to the hybridization pattern shown by the respective families. In another embodiment, variant sequences are selected from a sequence of a known repetitive DNA family. The selected variant sequence is classified as chromosome specific, chromosome preferential, or chromosome nonspecific. Sequences which are classified as chromosome preferential are further sequenced and regions are identified having a low homology with other regions of the chromosome preferential sequence or with known sequences of other family me This invention is the result of a contract with the Department of Energy (Contract No. W-7405-ENG-36).

  16. Recombination–deletion between homologous cassettes in retrovirus is suppressed via a strategy of degenerate codon substitution

    PubMed Central

    Im, Eung Jun; Bais, Anthony J; Yang, Wen; Ma, Qiangzhong; Guo, Xiuyang; Sepe, Steven M; Junghans, Richard P

    2014-01-01

    Transduction and expression procedures in gene therapy protocols may optimally transfer more than a single gene to correct a defect and/or transmit new functions to recipient cells or organisms. This may be accomplished by transduction with two (or more) vectors, or, more efficiently, in a single vector. Occasionally, it may be useful to coexpress homologous genes or chimeric proteins with regions of shared homology. Retroviridae include the dominant vector systems for gene transfer (e.g., gamma-retro and lentiviruses) and are capable of such multigene expression. However, these same viruses are known for efficient recombination–deletion when domains are duplicated within the viral genome. This problem can be averted by resorting to two-vector strategies (two-chain two-vector), but at a penalty to cost, convenience, and efficiency. Employing a chimeric antigen receptor system as an example, we confirm that coexpression of two genes with homologous domains in a single gamma-retroviral vector (two-chain single-vector) leads to recombination–deletion between repeated sequences, excising the equivalent of one of the chimeric antigen receptors. Here, we show that a degenerate codon substitution strategy in the two-chain single-vector format efficiently suppressed intravector deletional loss with rescue of balanced gene coexpression by minimizing sequence homology between repeated domains and preserving the final protein sequence. PMID:25419532

  17. Exploring the energy landscape of antibody-antigen complexes: protein dynamics, flexibility, and molecular recognition.

    PubMed

    Thielges, Megan C; Zimmermann, Jörg; Yu, Wayne; Oda, Masayuki; Romesberg, Floyd E

    2008-07-08

    The production of antibodies that selectively bind virtually any foreign compound is the hallmark of the immune system. While much is understood about how sequence diversity contributes to this remarkable feat of molecular recognition, little is known about how sequence diversity impacts antibody dynamics, which is also expected to contribute to molecular recognition. Toward this goal, we examined a panel of antibodies elicited to the chromophoric antigen fluorescein. On the basis of isothermal titration calorimetry, we selected six antibodies that bind fluorescein with diverse binding entropies, suggestive of varying contributions of dynamics to molecular recognition. Sequencing revealed that two pairs of antibodies employ homologous heavy chains that were derived from common germline genes, while the other two heavy chains and all six of the light chains were derived from different germline genes and are not homologous. Interestingly, more than half of all the somatic mutations acquired during affinity maturation among the six antibodies are located in positions unlikely to contact fluorescein directly. To quantify and compare the dynamics of the antibody-fluorescein complexes, three-pulse photon echo peak shift and transient grating spectroscopy were employed. All of the antibodies exhibited motions on three distinct time scales, ultrafast motions on the <100 fs time scale, diffusive motions on the picosecond time scale, and motions that occur on time scales longer than nanoseconds and thus appear static. However, the exact frequency of the picosecond time scale motion and the relative contribution of the different motions vary significantly among the antibody-chromophore complexes, revealing a high level of dynamic diversity. Using a hierarchical model, we relate the data to features of the antibodies' energy landscapes as well as their flexibility in terms of elasticity and plasticity. In all, the data provide a consistent picture of antibody flexibility, which interestingly appears to be correlated with binding entropy as well as with germline gene use and the mutations introduced during affinity maturation. The data also provide a gauge of the dynamic diversity of the antibody repertoire and suggest that this diversity might contribute to molecular recognition by facilitating the recognition of the broadest range of foreign molecules.

  18. Cytochrome b in human complex II (succinate-ubiquinone oxidoreductase): cDNA cloning of the components in liver mitochondria and chromosome assignment of the genes for the large (SDHC) and small (SDHD) subunits to 1q21 and 11q23.

    PubMed

    Hirawake, H; Taniwaki, M; Tamura, A; Kojima, S; Kita, K

    1997-01-01

    Complex II (succinate-ubiquinone oxidoreductase) is an important enzyme complex in both the tricarboxylic acid cycle and the aerobic respiratory chains of mitochondria in eukaryotic cells and prokaryotic organisms. In this study, the amino acid sequences of the large (cybL) and small (cybS) subunits of cytochrome b in human liver complex II were deduced from cDNAs isolated by homology probing with mixed primers for the polymerase chain reaction. The mature cybL and cybS contain 140 and 103 amino acids, respectively, and show little similarity to the amino acid sequences of the subunits from other species in contrast to the highly conserved features of the flavoprotein (Fp) subunit and iron-sulfur protein (Ip) subunit. From hydrophobicity analysis, both cybL and cybS appear to have three transmembrane segments, indicating their role as membrane-anchors for the enzyme complex. Histidine residues, which are possible heme axial ligands in cytochrome b of complex II, were found in the second transmembrane segment of each subunit. The genes for cybL (SDHC) and cybS (SDHD) were mapped to chromosome 1q21 and 11q23, respectively by fluorescent in situ hybridization (FISH).

  19. Further delineation of nonhomologous-based recombination and evidence for subtelomeric segmental duplications in 1p36 rearrangements.

    PubMed

    D'Angelo, Carla S; Gajecka, Marzena; Kim, Chong A; Gentles, Andrew J; Glotzbach, Caron D; Shaffer, Lisa G; Koiffmann, Célia P

    2009-06-01

    The mechanisms involved in the formation of subtelomeric rearrangements are now beginning to be elucidated. Breakpoint sequencing analysis of 1p36 rearrangements has made important contributions to this line of inquiry. Despite the unique architecture of segmental duplications inherent to human subtelomeres, no common mechanism has been identified thus far and different nonexclusive recombination-repair mechanisms seem to predominate. In order to gain further insights into the mechanisms of chromosome breakage, repair, and stabilization mediating subtelomeric rearrangements in humans, we investigated the constitutional rearrangements of 1p36. Cloning of the breakpoint junctions in a complex rearrangement and three non-reciprocal translocations revealed similarities at the junctions, such as microhomology of up to three nucleotides, along with no significant sequence identity in close proximity to the breakpoint regions. All the breakpoints appeared to be unique and their occurrence was limited to non-repetitive, unique DNA sequences. Several recombination- or cleavage-associated motifs that may promote non-homologous recombination were observed in close proximity to the junctions. We conclude that NHEJ is likely the mechanism of DNA repair that generates these rearrangements. Additionally, two apparently pure terminal deletions were also investigated, and the refinement of the breakpoint regions identified two distinct genomic intervals ~25-kb apart, each containing a series of 1p36 specific segmental duplications with 90-98% identity. Segmental duplications can serve as substrates for ectopic homologous recombination or stimulate genomic rearrangements.

  20. Resolution of Site-Specific Conformational Heterogeneity in Proline-Rich Molecular Recognition by Src Homology 3 Domains.

    PubMed

    Horness, Rachel E; Basom, Edward J; Mayer, John P; Thielges, Megan C

    2016-02-03

    Conformational heterogeneity and dynamics are increasingly evoked in models of protein molecular recognition but are challenging to experimentally characterize. Here we combine the inherent temporal resolution of infrared (IR) spectroscopy with the spatial resolution afforded by selective incorporation of carbon-deuterium (C-D) bonds, which provide frequency-resolved absorptions within a protein IR spectrum, to characterize the molecular recognition of the Src homology 3 (SH3) domain of the yeast protein Sho1 with its cognate proline-rich (PR) sequence of Pbs2. The IR absorptions of C-D bonds introduced at residues along a peptide of the Pbs2 PR sequence report on the changes in the local environments upon binding to the SH3 domain. Interestingly, upon forming the complex the IR spectra of the peptides labeled with C-D bonds at either of the two conserved prolines of the PXXP consensus recognition sequence show more absorptions than there are C-D bonds, providing evidence for the population of multiple states. In contrast, the NMR spectra of the peptides labeled with (13)C at the same residues show only single resonances, indicating rapid interconversion on the NMR time scale. Thus, the data suggest that the SH3 domain recognizes its cognate peptide with a component of induced fit molecular recognition involving the adoption of multiples states, which have previously gone undetected due to interconversion between the populated states that is too fast to resolve using conventional methods.

  1. Meiosis in male Drosophila

    PubMed Central

    McKee, Bruce D.; Yan, Rihui; Tsai, Jui-He

    2012-01-01

    Meiosis entails sorting and separating both homologous and sister chromatids. The mechanisms for connecting sister chromatids and homologs during meiosis are highly conserved and include specialized forms of the cohesin complex and a tightly regulated homolog synapsis/recombination pathway designed to yield regular crossovers between homologous chromatids. Drosophila male meiosis is of special interest because it dispenses with large segments of the standard meiotic script, particularly recombination, synapsis and the associated structures. Instead, Drosophila relies on a unique protein complex composed of at least two novel proteins, SNM and MNM, to provide stable connections between homologs during meiosis I. Sister chromatid cohesion in Drosophila is mediated by cohesins, ring-shaped complexes that entrap sister chromatids. However, unlike other eukaryotes Drosophila does not rely on the highly conserved Rec8 cohesin in meiosis, but instead utilizes two novel cohesion proteins, ORD and SOLO, which interact with the SMC1/3 cohesin components in providing meiotic cohesion. PMID:23087836

  2. Distribution of a Nocardia brasiliensis catalase gene fragment in members of the genera Nocardia, Gordona, and Rhodococcus.

    PubMed

    Vera-Cabrera, L; Johnson, W M; Welsh, O; Resendiz-Uresti, F L; Salinas-Carmona, M C

    1999-06-01

    An immunodominant protein from Nocardia brasiliensis, P61, was subjected to amino-terminal and internal sequence analysis. Three sequences of 22, 17, and 38 residues, respectively, were obtained and compared with the protein database from GenBank by using the BLAST system. The sequences showed homology to some eukaryotic catalases and to a bromoperoxidase-catalase from Streptomyces violaceus. Its identity as a catalase was confirmed by analysis of its enzymatic activity on H2O2 and by a double-staining method on a nondenaturing polyacrylamide gel with 3,3'-diaminobenzidine and ferricyanide; the result showed only catalase activity, but no peroxidase. By using one of the internal amino acid sequences and a consensus catalase motif (VGNNTP), we were able to design a PCR assay that generated a 500-bp PCR product. The amplicon was analyzed, and the nucleotide sequence was compared to the GenBank database with the observation of high homology to other bacterial and eukaryotic catalases. A PCR assay based on this target sequence was performed with primers NB10 and NB11 to confirm the presence of the NB10-NB11 gene fragment in several N. brasiliensis strains isolated from mycetoma. The same assay was used to determine whether there were homologous sequences in several type strains from the genera Nocardia, Rhodococcus, Gordona, and Streptomyces. All of the N. brasiliensis strains presented a positive result but only some of the actinomycetes species tested were positive in the PCR assay. In order to confirm these findings, genomic DNA was subjected to Southern blot analysis. A 1.7-kbp band was observed in the N. brasiliensis strains, and bands of different molecular weight were observed in cross-reacting actinomycetes. Sequence analysis of the amplicons of selected actinomycetes showed high homology in this catalase fragment, thus demonstrating that this protein is highly conserved in this group of bacteria.

  3. A discriminative method for family-based protein remote homology detection that combines inductive logic programming and propositional models

    PubMed Central

    2011-01-01

    Background Remote homology detection is a hard computational problem. Most approaches have trained computational models by using either full protein sequences or multiple sequence alignments (MSA), including all positions. However, when we deal with proteins in the "twilight zone" we can observe that only some segments of sequences (motifs) are conserved. We introduce a novel logical representation that allows us to represent physico-chemical properties of sequences, conserved amino acid positions and conserved physico-chemical positions in the MSA. From this, Inductive Logic Programming (ILP) finds the most frequent patterns (motifs) and uses them to train propositional models, such as decision trees and support vector machines (SVM). Results We use the SCOP database to perform our experiments by evaluating protein recognition within the same superfamily. Our results show that our methodology when using SVM performs significantly better than some of the state of the art methods, and comparable to other. However, our method provides a comprehensible set of logical rules that can help to understand what determines a protein function. Conclusions The strategy of selecting only the most frequent patterns is effective for the remote homology detection. This is possible through a suitable first-order logical representation of homologous properties, and through a set of frequent patterns, found by an ILP system, that summarizes essential features of protein functions. PMID:21429187

  4. A discriminative method for family-based protein remote homology detection that combines inductive logic programming and propositional models.

    PubMed

    Bernardes, Juliana S; Carbone, Alessandra; Zaverucha, Gerson

    2011-03-23

    Remote homology detection is a hard computational problem. Most approaches have trained computational models by using either full protein sequences or multiple sequence alignments (MSA), including all positions. However, when we deal with proteins in the "twilight zone" we can observe that only some segments of sequences (motifs) are conserved. We introduce a novel logical representation that allows us to represent physico-chemical properties of sequences, conserved amino acid positions and conserved physico-chemical positions in the MSA. From this, Inductive Logic Programming (ILP) finds the most frequent patterns (motifs) and uses them to train propositional models, such as decision trees and support vector machines (SVM). We use the SCOP database to perform our experiments by evaluating protein recognition within the same superfamily. Our results show that our methodology when using SVM performs significantly better than some of the state of the art methods, and comparable to other. However, our method provides a comprehensible set of logical rules that can help to understand what determines a protein function. The strategy of selecting only the most frequent patterns is effective for the remote homology detection. This is possible through a suitable first-order logical representation of homologous properties, and through a set of frequent patterns, found by an ILP system, that summarizes essential features of protein functions.

  5. The organisation and interviral homologies of genes at the 3' end of tobacco rattle virus RNA1

    PubMed Central

    Boccara, Martine; Hamilton, William D. O.; Baulcombe, David C.

    1986-01-01

    The RNA1 of tobacco rattle virus (TRV) has been cloned as cDNA and the nucleotide sequence determined of 2 kb from the 3'-terminal region. The sequence contains three long open reading frames. One of these starts 5' of the cDNA and probably corresponds to the carboxy-terminal sequence of a 170-K protein encoded on RNA1. The deduced protein sequence from this reading frame shows homology with the putative replicases of tobacco mosaic virus (TMV) and tricornaviruses. The location of the second open reading frame, which encodes a 29-K polypeptide, was shown by Northern blot analysis to coincide with a 1.6-kb subgenomic RNA. The validity of this reading frame was confirmed by showing that the cDNA extending over this region could be transcribed and translated in vitro to produce a polypeptide of the predicted size which co-migrates in electrophoresis with a translation product of authentic viral RNA. The sequence of this 29-K polypeptide showed homology with two regions in the 30-K protein of TMV. This homology includes positions in the TMV 30-K protein where mutations have been identified which affect the transport of virus between cells. The third open reading frame encodes a potential 16-K protein and was shown by Northern blot hybridisation to be contained within the region of a 0.7-kb subgenomic RNA which is found in cellular RNA of infected cells but not virus particles. The many similarities between TRV and TMV in viral morphology, gene organisation and sequence suggest that these two viral groups may share a common viral ancestor. ImagesFig. 2.Fig. 3. PMID:16453668

  6. The wheat cytochrome oxidase subunit II gene has an intron insert and three radical amino acid changes relative to maize

    PubMed Central

    Bonen, Linda; Boer, Poppo H.; Gray, Michael W.

    1984-01-01

    We have determined the sequence of the wheat mitochondrial gene for cytochrome oxidase subunit II (COII) and find that its derived protein sequence differs from that of maize at only three amino acid positions. Unexpectedly, all three replacements are non-conservative ones. The wheat COII gene has a highly-conserved intron at the same position as in maize, but the wheat intron is 1.5 times longer because of an insert relative to its maize counterpart. Hybridization analysis of mitochondrial DNA from rye, pea, broad bean and cucumber indicates strong sequence conservation of COII coding sequences among all these higher plants. However, only rye and maize mitochondrial DNA show homology with wheat COII intron sequences and rye alone with intron-insert sequences. We find that a sequence identical to the region of the 5' exon corresponding to the transmembrane domain of the COII protein is present at a second genomic location in wheat mitochondria. These variations in COII gene structure and size, as well as the presence of repeated COII sequences, illustrate at the DNA sequence level, factors which contribute to higher plant mitochondrial DNA diversity and complexity. ImagesFig. 3.Fig. 4.Fig. 5. PMID:16453565

  7. Nucleotide sequencing and serological evidence that the recently recognized deer tick virus is a genotype of Powassan virus.

    PubMed

    Beasley, D W; Suderman, M T; Holbrook, M R; Barrett, A D

    2001-11-05

    Deer tick virus (DTV) is a recently recognized North American virus isolated from Ixodes dammini ticks. Nucleotide sequencing of fragments of structural and non-structural protein genes suggested that this virus was most closely related to the tick-borne flavivirus Powassan (POW), which causes potentially fatal encephalitis in humans. To determine whether DTV represents a new and distinct member of the Flavivirus genus of the family Flaviviridae, we sequenced the structural protein genes and 5' and 3' non-coding regions of this virus. In addition, we compared the reactivity of DTV and POW in hemagglutination inhibition tests with a panel of polyclonal and monoclonal antisera, and performed cross-neutralization experiments using anti-DTV antisera. Nucleotide sequencing revealed a high degree of homology between DTV and POW at both nucleotide (>80% homology) and amino acid (>90% homology) levels, and the two viruses were indistinguishable in serological assays and mouse neuroinvasiveness. On the basis of these results, we suggest that DTV should be classified as a genotype of POW virus.

  8. Principles of long noncoding RNA evolution derived from direct comparison of transcriptomes in 17 species.

    PubMed

    Hezroni, Hadas; Koppstein, David; Schwartz, Matthew G; Avrutin, Alexandra; Bartel, David P; Ulitsky, Igor

    2015-05-19

    The inability to predict long noncoding RNAs from genomic sequence has impeded the use of comparative genomics for studying their biology. Here, we develop methods that use RNA sequencing (RNA-seq) data to annotate the transcriptomes of 16 vertebrates and the echinoid sea urchin, uncovering thousands of previously unannotated genes, most of which produce long intervening noncoding RNAs (lincRNAs). Although in each species, >70% of lincRNAs cannot be traced to homologs in species that diverged >50 million years ago, thousands of human lincRNAs have homologs with similar expression patterns in other species. These homologs share short, 5'-biased patches of sequence conservation nested in exonic architectures that have been extensively rewired, in part by transposable element exonization. Thus, over a thousand human lincRNAs are likely to have conserved functions in mammals, and hundreds beyond mammals, but those functions require only short patches of specific sequences and can tolerate major changes in gene architecture. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  9. AlloRep: A Repository of Sequence, Structural and Mutagenesis Data for the LacI/GalR Transcription Regulators.

    PubMed

    Sousa, Filipa L; Parente, Daniel J; Shis, David L; Hessman, Jacob A; Chazelle, Allen; Bennett, Matthew R; Teichmann, Sarah A; Swint-Kruse, Liskin

    2016-02-22

    Protein families evolve functional variation by accumulating point mutations at functionally important amino acid positions. Homologs in the LacI/GalR family of transcription regulators have evolved to bind diverse DNA sequences and allosteric regulatory molecules. In addition to playing key roles in bacterial metabolism, these proteins have been widely used as a model family for benchmarking structural and functional prediction algorithms. We have collected manually curated sequence alignments for >3000 sequences, in vivo phenotypic and biochemical data for >5750 LacI/GalR mutational variants, and noncovalent residue contact networks for 65 LacI/GalR homolog structures. Using this rich data resource, we compared the noncovalent residue contact networks of the LacI/GalR subfamilies to design and experimentally validate an allosteric mutant of a synthetic LacI/GalR repressor for use in biotechnology. The AlloRep database (freely available at www.AlloRep.org) is a key resource for future evolutionary studies of LacI/GalR homologs and for benchmarking computational predictions of functional change. Copyright © 2015 Elsevier Ltd. All rights reserved.

  10. Structures of Bacterial Biosynthetic Arginine Decarboxylases

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    F Forouhar; S Lew; J Seetharaman

    2011-12-31

    Biosynthetic arginine decarboxylase (ADC; also known as SpeA) plays an important role in the biosynthesis of polyamines from arginine in bacteria and plants. SpeA is a pyridoxal-5'-phosphate (PLP)-dependent enzyme and shares weak sequence homology with several other PLP-dependent decarboxylases. Here, the crystal structure of PLP-bound SpeA from Campylobacter jejuni is reported at 3.0 {angstrom} resolution and that of Escherichia coli SpeA in complex with a sulfate ion is reported at 3.1 {angstrom} resolution. The structure of the SpeA monomer contains two large domains, an N-terminal TIM-barrel domain followed by a {beta}-sandwich domain, as well as two smaller helical domains. Themore » TIM-barrel and {beta}-sandwich domains share structural homology with several other PLP-dependent decarboxylases, even though the sequence conservation among these enzymes is less than 25%. A similar tetramer is observed for both C. jejuni and E. coli SpeA, composed of two dimers of tightly associated monomers. The active site of SpeA is located at the interface of this dimer and is formed by residues from the TIM-barrel domain of one monomer and a highly conserved loop in the {beta}-sandwich domain of the other monomer. The PLP cofactor is recognized by hydrogen-bonding, {pi}-stacking and van der Waals interactions.« less

  11. Rare recessive loss-of-function methionyl-tRNA synthetase mutations presenting as a multi-organ phenotype

    PubMed Central

    2013-01-01

    Background Methionyl-tRNA synthetase (MARS) catalyzes the ligation of methionine to its cognate transfer RNA and therefore plays an essential role in protein biosynthesis. Methods We used exome sequencing, aminoacylation assays, homology modeling, and immuno-isolation of transfected MARS to identify and characterize mutations in the methionyl-tRNA synthetase gene (MARS) in an infant with an unexplained multi-organ phenotype. Results We identified compound heterozygous mutations (F370L and I523T) in highly conserved regions of MARS. The parents were each heterozygous for one of the mutations. Aminoacylation assays documented that the F370L and I523T MARS mutants had 18 ± 6% and 16 ± 6%, respectively, of wild-type activity. Homology modeling of the human MARS sequence with the structure of E. coli MARS showed that the F370L and I523T mutations are in close proximity to each other, with residue I523 located in the methionine binding pocket. We found that the F370L and I523T mutations did not affect the association of MARS with the multisynthetase complex. Conclusion This infant expands the catalogue of inherited human diseases caused by mutations in aminoacyl-tRNA synthetase genes. PMID:24103465

  12. Characterization of the Arabidopsis Augmin Complex Uncovers Its Critical Function in the Assembly of the Acentrosomal Spindle and Phragmoplast Microtubule Arrays[W

    PubMed Central

    Hotta, Takashi; Kong, Zhaosheng; Ho, Chin-Min Kimmy; Zeng, Cui Jing Tracy; Horio, Tetsuya; Fong, Sophia; Vuong, Trang; Lee, Yuh-Ru Julie; Liu, Bo

    2012-01-01

    Plant cells assemble the bipolar spindle and phragmoplast microtubule (MT) arrays in the absence of the centrosome structure. Our recent findings in Arabidopsis thaliana indicated that AUGMIN subunit3 (AUG3), a homolog of animal dim γ-tubulin 3, plays a critical role in γ-tubulin–dependent MT nucleation and amplification during mitosis. Here, we report the isolation of the entire plant augmin complex that contains eight subunits. Among them, AUG1 to AUG6 share low sequence similarity with their animal counterparts, but AUG7 and AUG8 share homology only with proteins of plant origin. Genetic analyses indicate that the AUG1, AUG2, AUG4, and AUG5 genes are essential, as stable mutations in these genes could only be transmitted to heterozygous plants. The sterile aug7-1 homozygous mutant in which AUG7 expression is significantly reduced exhibited pleiotropic phenotypes of seriously retarded vegetative and reproductive growth. The aug7-1 mutation caused delocalization of γ-tubulin in the mitotic spindle and phragmoplast. Consequently, spindles were abnormally elongated, and their poles failed to converge, as MTs were splayed to discrete positions rendering deformed arrays. In addition, the mutant phragmoplasts often had disorganized MT bundles with uneven edges. We conclude that assembly of MT arrays during plant mitosis depends on the augmin complex, which includes two plant-specific subunits. PMID:22505726

  13. Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model.

    PubMed

    Wang, Sheng; Sun, Siqi; Li, Zhen; Zhang, Renyu; Xu, Jinbo

    2017-01-01

    Protein contacts contain key information for the understanding of protein structure and function and thus, contact prediction from sequence is an important problem. Recently exciting progress has been made on this problem, but the predicted contacts for proteins without many sequence homologs is still of low quality and not very useful for de novo structure prediction. This paper presents a new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual neural networks. The first residual network conducts a series of 1-dimensional convolutional transformation of sequential features; the second residual network conducts a series of 2-dimensional convolutional transformation of pairwise information including output of the first residual network, EC information and pairwise potential. By using very deep residual networks, we can accurately model contact occurrence patterns and complex sequence-structure relationship and thus, obtain higher-quality contact prediction regardless of how many sequence homologs are available for proteins in question. Our method greatly outperforms existing methods and leads to much more accurate contact-assisted folding. Tested on 105 CASP11 targets, 76 past CAMEO hard targets, and 398 membrane proteins, the average top L long-range prediction accuracy obtained by our method, one representative EC method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21 and 0.30, respectively; the average top L/10 long-range accuracy of our method, CCMpred and MetaPSICOV is 0.77, 0.47 and 0.59, respectively. Ab initio folding using our predicted contacts as restraints but without any force fields can yield correct folds (i.e., TMscore>0.6) for 203 of the 579 test proteins, while that using MetaPSICOV- and CCMpred-predicted contacts can do so for only 79 and 62 of them, respectively. Our contact-assisted models also have much better quality than template-based models especially for membrane proteins. The 3D models built from our contact prediction have TMscore>0.5 for 208 of the 398 membrane proteins, while those from homology modeling have TMscore>0.5 for only 10 of them. Further, even if trained mostly by soluble proteins, our deep learning method works very well on membrane proteins. In the recent blind CAMEO benchmark, our fully-automated web server implementing this method successfully folded 6 targets with a new fold and only 0.3L-2.3L effective sequence homologs, including one β protein of 182 residues, one α+β protein of 125 residues, one α protein of 140 residues, one α protein of 217 residues, one α/β of 260 residues and one α protein of 462 residues. Our method also achieved the highest F1 score on free-modeling targets in the latest CASP (Critical Assessment of Structure Prediction), although it was not fully implemented back then. http://raptorx.uchicago.edu/ContactMap/.

  14. Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model

    PubMed Central

    Li, Zhen; Zhang, Renyu

    2017-01-01

    Motivation Protein contacts contain key information for the understanding of protein structure and function and thus, contact prediction from sequence is an important problem. Recently exciting progress has been made on this problem, but the predicted contacts for proteins without many sequence homologs is still of low quality and not very useful for de novo structure prediction. Method This paper presents a new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual neural networks. The first residual network conducts a series of 1-dimensional convolutional transformation of sequential features; the second residual network conducts a series of 2-dimensional convolutional transformation of pairwise information including output of the first residual network, EC information and pairwise potential. By using very deep residual networks, we can accurately model contact occurrence patterns and complex sequence-structure relationship and thus, obtain higher-quality contact prediction regardless of how many sequence homologs are available for proteins in question. Results Our method greatly outperforms existing methods and leads to much more accurate contact-assisted folding. Tested on 105 CASP11 targets, 76 past CAMEO hard targets, and 398 membrane proteins, the average top L long-range prediction accuracy obtained by our method, one representative EC method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21 and 0.30, respectively; the average top L/10 long-range accuracy of our method, CCMpred and MetaPSICOV is 0.77, 0.47 and 0.59, respectively. Ab initio folding using our predicted contacts as restraints but without any force fields can yield correct folds (i.e., TMscore>0.6) for 203 of the 579 test proteins, while that using MetaPSICOV- and CCMpred-predicted contacts can do so for only 79 and 62 of them, respectively. Our contact-assisted models also have much better quality than template-based models especially for membrane proteins. The 3D models built from our contact prediction have TMscore>0.5 for 208 of the 398 membrane proteins, while those from homology modeling have TMscore>0.5 for only 10 of them. Further, even if trained mostly by soluble proteins, our deep learning method works very well on membrane proteins. In the recent blind CAMEO benchmark, our fully-automated web server implementing this method successfully folded 6 targets with a new fold and only 0.3L-2.3L effective sequence homologs, including one β protein of 182 residues, one α+β protein of 125 residues, one α protein of 140 residues, one α protein of 217 residues, one α/β of 260 residues and one α protein of 462 residues. Our method also achieved the highest F1 score on free-modeling targets in the latest CASP (Critical Assessment of Structure Prediction), although it was not fully implemented back then. Availability http://raptorx.uchicago.edu/ContactMap/ PMID:28056090

  15. Evolution and the complexity of bacteriophages.

    PubMed

    Serwer, Philip

    2007-03-13

    The genomes of both long-genome (> 200 Kb) bacteriophages and long-genome eukaryotic viruses have cellular gene homologs whose selective advantage is not explained. These homologs add genomic and possibly biochemical complexity. Understanding their significance requires a definition of complexity that is more biochemically oriented than past empirically based definitions. Initially, I propose two biochemistry-oriented definitions of complexity: either decreased randomness or increased encoded information that does not serve immediate needs. Then, I make the assumption that these two definitions are equivalent. This assumption and recent data lead to the following four-part hypothesis that explains the presence of cellular gene homologs in long bacteriophage genomes and also provides a pathway for complexity increases in prokaryotic cells: (1) Prokaryotes underwent evolutionary increases in biochemical complexity after the eukaryote/prokaryote splits. (2) Some of the complexity increases occurred via multi-step, weak selection that was both protected from strong selection and accelerated by embedding evolving cellular genes in the genomes of bacteriophages and, presumably, also archaeal viruses (first tier selection). (3) The mechanisms for retaining cellular genes in viral genomes evolved under additional, longer-term selection that was stronger (second tier selection). (4) The second tier selection was based on increased access by prokaryotic cells to improved biochemical systems. This access was achieved when DNA transfer moved to prokaryotic cells both the more evolved genes and their more competitive and complex biochemical systems. I propose testing this hypothesis by controlled evolution in microbial communities to (1) determine the effects of deleting individual cellular gene homologs on the growth and evolution of long genome bacteriophages and hosts, (2) find the environmental conditions that select for the presence of cellular gene homologs, (3) determine which, if any, bacteriophage genes were selected for maintaining the homologs and (4) determine the dynamics of homolog evolution. This hypothesis is an explanation of evolutionary leaps in general. If accurate, it will assist both understanding and influencing the evolution of microbes and their communities. Analysis of evolutionary complexity increase for at least prokaryotes should include analysis of genomes of long-genome bacteriophages.

  16. Structure of the CRISPR Interference Complex CSM Reveals Key Similarities with Cascade

    PubMed Central

    Rouillon, Christophe; Zhou, Min; Zhang, Jing; Politis, Argyris; Beilsten-Edmands, Victoria; Cannone, Giuseppe; Graham, Shirley; Robinson, Carol V.; Spagnolo, Laura; White, Malcolm F.

    2013-01-01

    Summary The Clustered Regularly Interspaced Palindromic Repeats (CRISPR) system is an adaptive immune system in prokaryotes. Interference complexes encoded by CRISPR-associated (cas) genes utilize small RNAs for homology-directed detection and subsequent degradation of invading genetic elements, and they have been classified into three main types (I–III). Type III complexes share the Cas10 subunit but are subclassifed as type IIIA (CSM) and type IIIB (CMR), depending on their specificity for DNA or RNA targets, respectively. The role of CSM in limiting the spread of conjugative plasmids in Staphylococcus epidermidis was first described in 2008. Here, we report a detailed investigation of the composition and structure of the CSM complex from the archaeon Sulfolobus solfataricus, using a combination of electron microscopy, mass spectrometry, and deep sequencing. This reveals a three-dimensional model for the CSM complex that includes a helical component strikingly reminiscent of the backbone structure of the type I (Cascade) family. PMID:24119402

  17. Productive Homologous and Non-homologous Recombination of Hepatitis C Virus in Cell Culture

    PubMed Central

    Li, Yi-Ping; Mikkelsen, Lotte S.; Gottwein, Judith M.; Bukh, Jens

    2013-01-01

    Genetic recombination is an important mechanism for increasing diversity of RNA viruses, and constitutes a viral escape mechanism to host immune responses and to treatment with antiviral compounds. Although rare, epidemiologically important hepatitis C virus (HCV) recombinants have been reported. In addition, recombination is an important regulatory mechanism of cytopathogenicity for the related pestiviruses. Here we describe recombination of HCV RNA in cell culture leading to production of infectious virus. Initially, hepatoma cells were co-transfected with a replicating JFH1ΔE1E2 genome (genotype 2a) lacking functional envelope genes and strain J6 (2a), which has functional envelope genes but does not replicate in culture. After an initial decrease in the number of HCV positive cells, infection spread after 13–36 days. Sequencing of recovered viruses revealed non-homologous recombinants with J6 sequence from the 5′ end to the NS2–NS3 region followed by JFH1 sequence from Core to the 3′ end. These recombinants carried duplicated sequence of up to 2400 nucleotides. HCV replication was not required for recombination, as recombinants were observed in most experiments even when two replication incompetent genomes were co-transfected. Reverse genetic studies verified the viability of representative recombinants. After serial passage, subsequent recombination events reducing or eliminating the duplicated region were observed for some but not all recombinants. Furthermore, we found that inter-genotypic recombination could occur, but at a lower frequency than intra-genotypic recombination. Productive recombination of attenuated HCV genomes depended on expression of all HCV proteins and tolerated duplicated sequence. In general, no strong site specificity was observed. Non-homologous recombination was observed in most cases, while few homologous events were identified. A better understanding of HCV recombination could help identification of natural recombinants and thereby lead to improved therapy. Our findings suggest mechanisms for occurrence of recombinants observed in patients. PMID:23555245

  18. Network-based function prediction and interactomics: the case for metabolic enzymes.

    PubMed

    Janga, S C; Díaz-Mejía, J Javier; Moreno-Hagelsieb, G

    2011-01-01

    As sequencing technologies increase in power, determining the functions of unknown proteins encoded by the DNA sequences so produced becomes a major challenge. Functional annotation is commonly done on the basis of amino-acid sequence similarity alone. Long after sequence similarity becomes undetectable by pair-wise comparison, profile-based identification of homologs can often succeed due to the conservation of position-specific patterns, important for a protein's three dimensional folding and function. Nevertheless, prediction of protein function from homology-driven approaches is not without problems. Homologous proteins might evolve different functions and the power of homology detection has already started to reach its maximum. Computational methods for inferring protein function, which exploit the context of a protein in cellular networks, have come to be built on top of homology-based approaches. These network-based functional inference techniques provide both a first hand hint into a proteins' functional role and offer complementary insights to traditional methods for understanding the function of uncharacterized proteins. Most recent network-based approaches aim to integrate diverse kinds of functional interactions to boost both coverage and confidence level. These techniques not only promise to solve the moonlighting aspect of proteins by annotating proteins with multiple functions, but also increase our understanding on the interplay between different functional classes in a cell. In this article we review the state of the art in network-based function prediction and describe some of the underlying difficulties and successes. Given the volume of high-throughput data that is being reported the time is ripe to employ these network-based approaches, which can be used to unravel the functions of the uncharacterized proteins accumulating in the genomic databases. © 2010 Elsevier Inc. All rights reserved.

  19. The Complete Genome Sequence of Herpesvirus Papio 2 (Cercopithecine Herpesvirus 16) Shows Evidence of Recombination Events among Various Progenitor Herpesviruses†

    PubMed Central

    Tyler, Shaun D.; Severini, Alberto

    2006-01-01

    We have sequenced the entire genome of herpesvirus papio 2 (HVP-2; Cercopithecine herpesvirus 16) strain X313, a baboon herpesvirus with close homology to other primate alphaherpesviruses, such as SA8, monkey B virus, and herpes simplex virus (HSV) type 1 and type 2. The genome of HVP-2 is 156,487 bp in length, with an overall GC content of 76.5%. The genome organization is identical to that of the other members of the genus Simplexvirus, with a long and a short unique region, each bordered by inverted repeats which end with an “a” sequence. All of the open reading frames detected in this genome were homologous and colinear with those of SA8 and B virus. The HSV gene RL1 (γ134.5; neurovirulence factor) is not present in HVP-2, as is the case for SA8 and B virus. The HVP-2 genome is 85% homologous to its closest relative, SA8. However, segment-by-segment bootstrap analysis of the genome revealed at least two regions that display closer homology to the corresponding sequences of B virus. The first region comprises the UL41 to UL44 genes, and the second region is located within the UL36 gene. We hypothesize that this localized and defined shift in homology is due to recombination events between an SA8-like progenitor of HVP-2 and a herpesvirus species more closely related to the B virus. Since some of the genes involved in these putative recombination events are determinants of virulence, a comparative analysis of their function may provide insight into the pathogenic mechanism of simplexviruses. PMID:16414998

  20. The complete genome sequence of herpesvirus papio 2 (Cercopithecine herpesvirus 16) shows evidence of recombination events among various progenitor herpesviruses.

    PubMed

    Tyler, Shaun D; Severini, Alberto

    2006-02-01

    We have sequenced the entire genome of herpesvirus papio 2 (HVP-2; Cercopithecine herpesvirus 16) strain X313, a baboon herpesvirus with close homology to other primate alphaherpesviruses, such as SA8, monkey B virus, and herpes simplex virus (HSV) type 1 and type 2. The genome of HVP-2 is 156,487 bp in length, with an overall GC content of 76.5%. The genome organization is identical to that of the other members of the genus Simplexvirus, with a long and a short unique region, each bordered by inverted repeats which end with an "a" sequence. All of the open reading frames detected in this genome were homologous and colinear with those of SA8 and B virus. The HSV gene RL1 (gamma(1)34.5; neurovirulence factor) is not present in HVP-2, as is the case for SA8 and B virus. The HVP-2 genome is 85% homologous to its closest relative, SA8. However, segment-by-segment bootstrap analysis of the genome revealed at least two regions that display closer homology to the corresponding sequences of B virus. The first region comprises the UL41 to UL44 genes, and the second region is located within the UL36 gene. We hypothesize that this localized and defined shift in homology is due to recombination events between an SA8-like progenitor of HVP-2 and a herpesvirus species more closely related to the B virus. Since some of the genes involved in these putative recombination events are determinants of virulence, a comparative analysis of their function may provide insight into the pathogenic mechanism of simplexviruses.

  1. Immunizations with chimeric hepatitis B virus-like particles to induce potential anti-hepatitis C virus neutralizing antibodies.

    PubMed

    Vietheer, Patricia T K; Boo, Irene; Drummer, Heidi E; Netter, Hans-Jürgen

    2007-01-01

    Virus-like particles (VLPs) are highly immunogenic and proven to induce protective immunity. The small surface antigen (HBsAg-S) of hepatitis B virus (HBV) self-assembles into VLPs and its use as a vaccine results in protective antiviral immunity against HBV infections. Chimeric HBsAg-S proteins carrying foreign epitopes allow particle formation and have the ability to induce anti-foreign humoral and cellular immune responses. The insertion of the hypervariable region 1 (HVR1) sequence derived from the envelope protein 2 (E2) of hepatitis C virus (HCV) into the major antigenic site of HBsAg-S ('a'-determinant) resulted in the formation of highly immunogenic VLPs that retained the antigenicity of the inserted HVR1 sequence. BALB/c mice were immunized with chimeric VLPs, which resulted in antisera with anti-HCV activity. The antisera were able to immunoprecipitate native HCV envelope complexes (E1E2) containing homologous or heterologous HVR1 sequences. HCV E1E2 pseudotyped HIV-1 particles (HCVpp) were used to measure entry into HuH-7 target cells in the presence or absence of antisera that were raised against chimeric VLPs. Anti-HVR1 VLP sera interfered with entry of entry-competent HCVpps containing either homologous or heterologous HVR1 sequences. Also, immunizations with chimeric VLPs induced antisurface antigen (HBsAg) antibodies, indicating that HBV-specific antigenicity and immunogenicity of the 'a'-determinant region is retained. A multivalent vaccine against different pathogens based on the HBsAg delivery platform should be possible. We hypothesize that custom design of VLPs with an appropriate set of HCV-neutralizing epitopes will induce antibodies that would serve to decrease the viral load at the initial infecting inoculum.

  2. Mapping neurofibromatosis 1 homologous loci by fluorescence in situ hybridization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Viskochil, D.; Breidenbach, H.H.; Cawthon, R.

    Neurofibromatosis 1 maps to chromosome band 17q11.2 and the NF1 gene is comprised of 59 exons that span approximately 335 kb of genomic DNA. In order to further analyze the structure of NF1 from exons 2 through 27b, we isolated a number of cosmid and bacteriophage P-1 genomic clones using NF1-exon probes under high-stringency hybridization conditions. Using tagged, intron-based primers and DNA from various clones as a template, we PCR-amplified and sequenced individual NF1 exons. The exon sequences in PCR products from several genomic clones differed from the exon sequence derived from cloned NF1 cDNAs. Clones with variant sequences weremore » mapped by fluorescence in situ hybridization under high-stringency conditions. Three clones mapped to chromosome band 15q11.2, one mapped to 14q11.2, one mapped to both 2q14.1-14.3 and 14q11.2, one mapped to 2q33-34, and one mapped to both 18q11.2 and 21q21. Even though some PCR-product sequences retained proper splice junctions and open reading frames, we have yet to identify cDNAs that correspond to the variant exon sequences. We are now sequencing clones that map to NF1-homologous loci in order to develop discriminating primer pairs for the exclusive amplification of NF1-specific sequences in our efforts to develop a comprehensive NF1 mutation screen using genomic DNA as template. The role of NF1-homologous sequences may play in neurofibromatosis 1 is not clear.« less

  3. Evolution of EF-hand calcium-modulated proteins. IV. Exon shuffling did not determine the domain compositions of EF-hand proteins

    NASA Technical Reports Server (NTRS)

    Kretsinger, R. H.; Nakayama, S.

    1993-01-01

    In the previous three reports in this series we demonstrated that the EF-hand family of proteins evolved by a complex pattern of gene duplication, transposition, and splicing. The dendrograms based on exon sequences are nearly identical to those based on protein sequences for troponin C, the essential light chain myosin, the regulatory light chain, and calpain. This validates both the computational methods and the dendrograms for these subfamilies. The proposal of congruence for calmodulin, troponin C, essential light chain, and regulatory light chain was confirmed. There are, however, significant differences in the calmodulin dendrograms computed from DNA and from protein sequences. In this study we find that introns are distributed throughout the EF-hand domain and the interdomain regions. Further, dendrograms based on intron type and distribution bear little resemblance to those based on protein or on DNA sequences. We conclude that introns are inserted, and probably deleted, with relatively high frequency. Further, in the EF-hand family exons do not correspond to structural domains and exon shuffling played little if any role in the evolution of this widely distributed homolog family. Calmodulin has had a turbulent evolution. Its dendrograms based on protein sequence, exon sequence, 3'-tail sequence, intron sequences, and intron positions all show significant differences.

  4. Molecular characterization of DnaJ 5 homologs in silkworm Bombyx mori and its expression during egg diapause.

    PubMed

    Sirigineedi, Sasibhushan; Vijayagowri, Esvaran; Murthy, Geetha N; Rao, Guruprasada; Ponnuvel, Kangayam M

    2014-12-01

    A comparison of the cDNA sequences (1 056 bp) of Bombyx mori DnaJ 5 homolog with B. mori genome revealed that unlike in other Hsps, it has an intron of 234 bp. The DnaJ 5 homolog contains 351 amino acids, of which 70 contain the conserved DnaJ domain at the N-terminal end. This homolog of B. mori has all desirable functional domains similar to other insects, and the 13 different DnaJ homologs identified in B. mori genome were distributed on different chromosomes. The expressed sequence tag database analysis of Hsp40 gene expression revealed higher expression in wing disc followed by diapause-induced eggs. Microarray analysis revealed higher expression of DnaJ 5 homolog at 18th h after oviposition in diapause-induced eggs. Further validation of DnaJ 5 expression through qPCR in diapause-induced and nondiapause eggs at different time intervals revealed higher expression in diapause eggs at 18 and 24 h after oviposition, which coincided with the expression of Hsp70 as the Hsp 40 is its co-chaperone. This study thus provides an outline of the genome organization of Hsp40 gene, and its role in egg diapause induction in B. mori. © 2013 Institute of Zoology, Chinese Academy of Sciences.

  5. ECOD: An Evolutionary Classification of Protein Domains

    PubMed Central

    Kinch, Lisa N.; Pei, Jimin; Shi, Shuoyong; Kim, Bong-Hyun; Grishin, Nick V.

    2014-01-01

    Understanding the evolution of a protein, including both close and distant relationships, often reveals insight into its structure and function. Fast and easy access to such up-to-date information facilitates research. We have developed a hierarchical evolutionary classification of all proteins with experimentally determined spatial structures, and presented it as an interactive and updatable online database. ECOD (Evolutionary Classification of protein Domains) is distinct from other structural classifications in that it groups domains primarily by evolutionary relationships (homology), rather than topology (or “fold”). This distinction highlights cases of homology between domains of differing topology to aid in understanding of protein structure evolution. ECOD uniquely emphasizes distantly related homologs that are difficult to detect, and thus catalogs the largest number of evolutionary links among structural domain classifications. Placing distant homologs together underscores the ancestral similarities of these proteins and draws attention to the most important regions of sequence and structure, as well as conserved functional sites. ECOD also recognizes closer sequence-based relationships between protein domains. Currently, approximately 100,000 protein structures are classified in ECOD into 9,000 sequence families clustered into close to 2,000 evolutionary groups. The classification is assisted by an automated pipeline that quickly and consistently classifies weekly releases of PDB structures and allows for continual updates. This synchronization with PDB uniquely distinguishes ECOD among all protein classifications. Finally, we present several case studies of homologous proteins not recorded in other classifications, illustrating the potential of how ECOD can be used to further biological and evolutionary studies. PMID:25474468

  6. ECOD: an evolutionary classification of protein domains.

    PubMed

    Cheng, Hua; Schaeffer, R Dustin; Liao, Yuxing; Kinch, Lisa N; Pei, Jimin; Shi, Shuoyong; Kim, Bong-Hyun; Grishin, Nick V

    2014-12-01

    Understanding the evolution of a protein, including both close and distant relationships, often reveals insight into its structure and function. Fast and easy access to such up-to-date information facilitates research. We have developed a hierarchical evolutionary classification of all proteins with experimentally determined spatial structures, and presented it as an interactive and updatable online database. ECOD (Evolutionary Classification of protein Domains) is distinct from other structural classifications in that it groups domains primarily by evolutionary relationships (homology), rather than topology (or "fold"). This distinction highlights cases of homology between domains of differing topology to aid in understanding of protein structure evolution. ECOD uniquely emphasizes distantly related homologs that are difficult to detect, and thus catalogs the largest number of evolutionary links among structural domain classifications. Placing distant homologs together underscores the ancestral similarities of these proteins and draws attention to the most important regions of sequence and structure, as well as conserved functional sites. ECOD also recognizes closer sequence-based relationships between protein domains. Currently, approximately 100,000 protein structures are classified in ECOD into 9,000 sequence families clustered into close to 2,000 evolutionary groups. The classification is assisted by an automated pipeline that quickly and consistently classifies weekly releases of PDB structures and allows for continual updates. This synchronization with PDB uniquely distinguishes ECOD among all protein classifications. Finally, we present several case studies of homologous proteins not recorded in other classifications, illustrating the potential of how ECOD can be used to further biological and evolutionary studies.

  7. Evolutionary distance from human homologs reflects allergenicity of animal food proteins.

    PubMed

    Jenkins, John A; Breiteneder, Heimo; Mills, E N Clare

    2007-12-01

    In silico analysis of allergens can identify putative relationships among protein sequence, structure, and allergenic properties. Such systematic analysis reveals that most plant food allergens belong to a restricted number of protein superfamilies, with pollen allergens behaving similarly. We have investigated the structural relationships of animal food allergens and their evolutionary relatedness to human homologs to define how closely a protein must resemble a human counterpart to lose its allergenic potential. Profile-based sequence homology methods were used to classify animal food allergens into Pfam families, and in silico analyses of their evolutionary and structural relationships were performed. Animal food allergens could be classified into 3 main families--tropomyosins, EF-hand proteins, and caseins--along with 14 minor families each composed of 1 to 3 allergens. The evolutionary relationships of each of these allergen superfamilies showed that in general, proteins with a sequence identity to a human homolog above approximately 62% were rarely allergenic. Single substitutions in otherwise highly conserved regions containing IgE epitopes in EF-hand parvalbumins may modulate allergenicity. These data support the premise that certain protein structures are more allergenic than others. Contrasting with plant food allergens, animal allergens, such as the highly conserved tropomyosins, challenge the capability of the human immune system to discriminate between foreign and self-proteins. Such immune responses run close to becoming autoimmune responses. Exploiting the closeness between animal allergens and their human homologs in the development of recombinant allergens for immunotherapy will need to consider the potential for developing unanticipated autoimmune responses.

  8. Cloning and characterization of the ddc homolog encoding L-2,4-diaminobutyrate decarboxylase in Enterobacter aerogenes.

    PubMed

    Yamamoto, S; Mutoh, N; Tsuzuki, D; Ikai, H; Nakao, H; Shinoda, S; Narimatsu, S; Miyoshi, S I

    2000-05-01

    L-2,4-diaminobutyrate decarboxylase (DABA DC) catalyzes the formation of 1,3-diaminopropane (DAP) from DABA. In the present study, the ddc gene encoding DABA DC from Enterobacter aerogenes ATCC 13048 was cloned and characterized. Determination of the nucleotide sequence revealed an open reading frame of 1470 bp encoding a 53659-Da protein of 490 amino acids, whose deduced NH2-terminal sequence was identical to that of purified DABA DC from E. aerogenes. The deduced amino acid sequence was highly similar to those of Acinetobacter baumannii and Haemophilus influenzae DABA DCs encoded by the ddc genes. The lysine-307 of the E. aerogenes DABA DC was identified as the pyridoxal 5'-phosphate binding residue by site-directed mutagenesis. Furthermore, PCR analysis revealed the distribution of E. aerogenes ddc homologs in some other species of Enterobacteriaceae. Such a relatively wide occurrence of the ddc homologs implies biological significance of DABA DC and its product DAP.

  9. GPU-Acceleration of Sequence Homology Searches with Database Subsequence Clustering

    PubMed Central

    Suzuki, Shuji; Kakuta, Masanori; Ishida, Takashi; Akiyama, Yutaka

    2016-01-01

    Sequence homology searches are used in various fields and require large amounts of computation time, especially for metagenomic analysis, owing to the large number of queries and the database size. To accelerate computing analyses, graphics processing units (GPUs) are widely used as a low-cost, high-performance computing platform. Therefore, we mapped the time-consuming steps involved in GHOSTZ, which is a state-of-the-art homology search algorithm for protein sequences, onto a GPU and implemented it as GHOSTZ-GPU. In addition, we optimized memory access for GPU calculations and for communication between the CPU and GPU. As per results of the evaluation test involving metagenomic data, GHOSTZ-GPU with 12 CPU threads and 1 GPU was approximately 3.0- to 4.1-fold faster than GHOSTZ with 12 CPU threads. Moreover, GHOSTZ-GPU with 12 CPU threads and 3 GPUs was approximately 5.8- to 7.7-fold faster than GHOSTZ with 12 CPU threads. PMID:27482905

  10. SANSparallel: interactive homology search against Uniprot

    PubMed Central

    Somervuo, Panu; Holm, Liisa

    2015-01-01

    Proteins evolve by mutations and natural selection. The network of sequence similarities is a rich source for mining homologous relationships that inform on protein structure and function. There are many servers available to browse the network of homology relationships but one has to wait up to a minute for results. The SANSparallel webserver provides protein sequence database searches with immediate response and professional alignment visualization by third-party software. The output is a list, pairwise alignment or stacked alignment of sequence-similar proteins from Uniprot, UniRef90/50, Swissprot or Protein Data Bank. The stacked alignments are viewed in Jalview or as sequence logos. The database search uses the suffix array neighborhood search (SANS) method, which has been re-implemented as a client-server, improved and parallelized. The method is extremely fast and as sensitive as BLAST above 50% sequence identity. Benchmarks show that the method is highly competitive compared to previously published fast database search programs: UBLAST, DIAMOND, LAST, LAMBDA, RAPSEARCH2 and BLAT. The web server can be accessed interactively or programmatically at http://ekhidna2.biocenter.helsinki.fi/cgi-bin/sans/sans.cgi. It can be used to make protein functional annotation pipelines more efficient, and it is useful in interactive exploration of the detailed evidence supporting the annotation of particular proteins of interest. PMID:25855811

  11. Heterochromatic self-association, a determinant of nuclear organization, does not require sequence homology in Drosophila.

    PubMed Central

    Sage, Brian T; Csink, Amy K

    2003-01-01

    Chromosomes of higher eukaryotes contain blocks of heterochromatin that can associate with each other in the interphase nucleus. A well-studied example of heterochromatic interaction is the brown(Dominant) (bwD) chromosome of D. melanogaster, which contains an approximately 1.6-Mbp insertion of AAGAG repeats near the distal tip of chromosome 2. This insertion causes association of the tip with the centric heterochromatin of chromosome 2 (2h), which contains megabases of AAGAG repeats. Here we describe an example, other than bwD, in which distally translocated heterochromatin associates with centric heterochromatin. Additionally, we show that when a translocation places bwD on a different chromosome, bwD tends to associate with the centric heterochromatin of this chromosome, even when the chromosome contains a small fraction of the sequence homology present elsewhere. To further test the importance of sequence homology in these interactions, we used interspecific mating to introgress the bwD allele from D. melanogaster into D. simulans, which lacks the AAGAG on the autosomes. We find that D. simulans bwD associates with 2h, which lacks the AAGAG sequence, while it does not associate with the AAGAG containing X chromosome heterochromatin. Our results show that intranuclear association of separate heterochromatic blocks does not require that they contain the same sequence. PMID:14668374

  12. Saccharomyces cerevisiae SSB1 protein and its relationship to nucleolar RNA-binding proteins.

    PubMed

    Jong, A Y; Clark, M W; Gilbert, M; Oehm, A; Campbell, J L

    1987-08-01

    To better define the function of Saccharomyces cerevisiae SSB1, an abundant single-stranded nucleic acid-binding protein, we determined the nucleotide sequence of the SSB1 gene and compared it with those of other proteins of known function. The amino acid sequence contains 293 amino acid residues and has an Mr of 32,853. There are several stretches of sequence characteristic of other eucaryotic single-stranded nucleic acid-binding proteins. At the amino terminus, residues 39 to 54 are highly homologous to a peptide in calf thymus UP1 and UP2 and a human heterogeneous nuclear ribonucleoprotein. Residues 125 to 162 constitute a fivefold tandem repeat of the sequence RGGFRG, the composition of which suggests a nucleic acid-binding site. Near the C terminus, residues 233 to 245 are homologous to several RNA-binding proteins. Of 18 C-terminal residues, 10 are acidic, a characteristic of the procaryotic single-stranded DNA-binding proteins and eucaryotic DNA- and RNA-binding proteins. In addition, examination of the subcellular distribution of SSB1 by immunofluorescence microscopy indicated that SSB1 is a nuclear protein, predominantly located in the nucleolus. Sequence homologies and the nucleolar localization make it likely that SSB1 functions in RNA metabolism in vivo, although an additional role in DNA metabolism cannot be excluded.

  13. Microsporidia, amitochondrial protists, possess a 70-kDa heat shock protein gene of mitochondrial evolutionary origin.

    PubMed

    Peyretaillade, E; Broussolle, V; Peyret, P; Méténier, G; Gouy, M; Vivarès, C P

    1998-06-01

    An intronless gene encoding a protein of 592 amino acid residues with similarity to 70-kDa heat shock proteins (HSP70s) has been cloned and sequenced from the amitochondrial protist Encephalitozoon cuniculi (phylum Microsporidia). Southern blot analyses show the presence of a single gene copy located on chromosome XI. The encoded protein exhibits an N-terminal hydrophobic leader sequence and two motifs shared by proteobacterial and mitochondrially expressed HSP70 homologs. Phylogenetic analysis using maximum likelihood and evolutionary distances place the E. cuniculi sequence in the cluster of mitochondrially expressed HSP70s, with a higher evolutionary rate than those of homologous sequences. Similar results were obtained after cloning a fragment of the homologous gene in the closely related species E. hellem. The presence of a nuclear targeting signal-like sequence supports a role of the Encephalitozoon HSP70 as a molecular chaperone of nuclear proteins. No evidence for cytosolic or endoplasmic reticulum forms of HSP70 was obtained through PCR amplification. These data suggest that Encephalitozoon species have evolved from an ancestor bearing mitochondria, which is in disagreement with the postulated presymbiotic origin of Microsporidia. The specific role and intracellular localization of the mitochondrial HSP70-like protein remain to be elucidated.

  14. Using relational databases for improved sequence similarity searching and large-scale genomic analyses.

    PubMed

    Mackey, Aaron J; Pearson, William R

    2004-10-01

    Relational databases are designed to integrate diverse types of information and manage large sets of search results, greatly simplifying genome-scale analyses. Relational databases are essential for management and analysis of large-scale sequence analyses, and can also be used to improve the statistical significance of similarity searches by focusing on subsets of sequence libraries most likely to contain homologs. This unit describes using relational databases to improve the efficiency of sequence similarity searching and to demonstrate various large-scale genomic analyses of homology-related data. This unit describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. These include basic use of the database to generate a novel sequence library subset, how to extend and use seqdb_demo for the storage of sequence similarity search results and making use of various kinds of stored search results to address aspects of comparative genomic analysis.

  15. Top-Down-Assisted Bottom-Up Method for Homologous Protein Sequencing: Hemoglobin from 33 Bird Species

    NASA Astrophysics Data System (ADS)

    Song, Yang; Laskay, Ünige A.; Vilcins, Inger-Marie E.; Barbour, Alan G.; Wysocki, Vicki H.

    2015-11-01

    Ticks are vectors for disease transmission because they are indiscriminant in their feeding on multiple vertebrate hosts, transmitting pathogens between their hosts. Identifying the hosts on which ticks have fed is important for disease prevention and intervention. We have previously shown that hemoglobin (Hb) remnants from a host on which a tick fed can be used to reveal the host's identity. For the present research, blood was collected from 33 bird species that are common in the U.S. as hosts for ticks but that have unknown Hb sequences. A top-down-assisted bottom-up mass spectrometry approach with a customized searching database, based on variability in known bird hemoglobin sequences, has been devised to facilitate fast and complete sequencing of hemoglobin from birds with unknown sequences. These hemoglobin sequences will be added to a hemoglobin database and used for tick host identification. The general approach has the potential to sequence any set of homologous proteins completely in a rapid manner.

  16. The HMMER Web Server for Protein Sequence Similarity Search.

    PubMed

    Prakash, Ananth; Jeffryes, Matt; Bateman, Alex; Finn, Robert D

    2017-12-08

    Protein sequence similarity search is one of the most commonly used bioinformatics methods for identifying evolutionarily related proteins. In general, sequences that are evolutionarily related share some degree of similarity, and sequence-search algorithms use this principle to identify homologs. The requirement for a fast and sensitive sequence search method led to the development of the HMMER software, which in the latest version (v3.1) uses a combination of sophisticated acceleration heuristics and mathematical and computational optimizations to enable the use of profile hidden Markov models (HMMs) for sequence analysis. The HMMER Web server provides a common platform by linking the HMMER algorithms to databases, thereby enabling the search for homologs, as well as providing sequence and functional annotation by linking external databases. This unit describes three basic protocols and two alternate protocols that explain how to use the HMMER Web server using various input formats and user defined parameters. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.

  17. Molecular characterization of a novel Luteovirus from peach identified by high-throughput sequencing

    USDA-ARS?s Scientific Manuscript database

    Contigs with sequence homologies to Cherry-associated luteovirus were identified by high-throughput sequencing analysis of two peach accessions undergoing quarantine testing. The complete genomic sequences of the two isolates of this virus are 5,819 and 5,814 nucleotides. Their genome organization i...

  18. Differential protein expression in alligator leukocytes in response to bacterial lipopolysaccharide injection.

    PubMed

    Merchant, Mark; Kinney, Clint; Sanders, Paige

    2009-12-01

    Blood was collected from three juvenile alligators (Alligator mississippiensis) before, and again 24h after, injection with bacterial lipopolysaccharide (LPS). The leukocytes were collected from both samples, and the proteins were extracted. Each group of proteins was labeled with a different fluorescent dye and the differences in protein expression were analyzed by two dimensional differential in-gel expressions (2D-DIGE). The proteins which appeared to be increased or decreased by treatment with LPS were selected and analyzed by MALDI-TOF to determine mass and LC-MS/MS to acquire the partial protein sequences. The peptide sequences were compared to the NCBI protein sequence database to determine homology with other sequences from other species. Several proteins of interest appeared to be increased upon LPS stimulation. Proteins with homology to human transgelin-2, fish glucose-6-phosphate dehydrogenase, amphibian α-enolase, alligator lactate dehydrogenase, fish ubiquitin-activating enzyme, and fungal β-tubulin were also increased after LPS injection. Proteins with homology to fish vimentin 4, murine heterogeneous nuclear ribonucleoprotein A3, and avian calreticulin were found to be decreased in response to LPS. In addition, five proteins, four of which were up-regulated (827, 560, 512, and 650%) and one that exhibited repressed expression (307%), did not show homology to any protein in the database, and thus may represent newly discovered proteins. We are using this biochemical approach to isolate and characterize alligator proteins with potential relevant immune function.

  19. Mycobacterial polyketide-associated proteins are acyltransferases: Proof of principle with Mycobacterium tuberculosis PapA5

    PubMed Central

    Onwueme, Kenolisa C.; Ferreras, Julian A.; Buglino, John; Lima, Christopher D.; Quadri, Luis E. N.

    2004-01-01

    Mycobacterium tuberculosis (Mt) produces complex virulence-enhancing lipids with scaffolds consisting of phthiocerol and phthiodiolone dimycocerosate esters (PDIMs). Sequence analysis suggested that PapA5, a so-called polyketide-associated protein (Pap) encoded in the PDIM synthesis gene cluster, as well as PapA5 homologs found in Mt and other species, are a subfamily of acyltransferases. Studies with recombinant protein confirmed that PapA5 is an acetyltransferase. Deletion analysis in Mt demonstrated that papA5 is required for PDIM synthesis. We propose that PapA5 catalyzes diesterification of phthiocerol and phthiodiolone with mycocerosate. These studies present the functional characterization of a Pap and permit inferences regarding roles of other Paps in the synthesis of complex lipids, including the antibiotic rifamycin. PMID:15070765

  20. Molecular identification of aiiA homologous gene from endophytic Enterobacter species and in silico analysis of putative tertiary structure of AHL-lactonase.

    PubMed

    Rajesh, P S; Rai, V Ravishankar

    2014-01-03

    The aiiA homologous gene known to encode AHL- lactonase enzyme which hydrolyze the N-acylhomoserine lactone (AHL) quorum sensing signaling molecules produced by Gram negative bacteria. In this study, the degradation of AHL molecules was determined by cell-free lysate of endophytic Enterobacter species. The percentage of quorum quenching was confirmed and quantified by HPLC method (p<0.0001). Amplification and sequence BLAST analysis showed the presence of aiiA homologous gene in endophytic Enterobacter asburiae VT65, Enterobacter aerogenes VT66 and Enterobacter ludwigii VT70 strains. Sequence alignment analysis revealed the presence of two zinc binding sites, "HXHXDH" motif as well as tyrosine residue at the position 194. Based on known template available at Swiss-Model, putative tertiary structure of AHL-lactonase was constructed. The result showed that novel endophytic strains of Enterobacter genera encode the novel aiiA homologous gene and its structural importance for future study. Copyright © 2013 Elsevier Inc. All rights reserved.

  1. The Comparative Genomics and Phylogenomics of Leishmania amazonensis Parasite

    PubMed Central

    Tschoeke, Diogo A; Nunes, Gisele L; Jardim, Rodrigo; Lima, Joana; Dumaresq, Aline SR; Gomes, Monete R; de Mattos Pereira, Leandro; Loureiro, Daniel R; Stoco, Patricia H; de Matos Guedes, Herbert Leonel; de Miranda, Antonio Basilio; Ruiz, Jeronimo; Pitaluga, André; Silva, Floriano P; Probst, Christian M; Dickens, Nicholas J; Mottram, Jeremy C; Grisard, Edmundo C; Dávila, Alberto MR

    2014-01-01

    Leishmaniasis is an infectious disease caused by Leishmania species. Leishmania amazonensis is a New World Leishmania species belonging to the Mexicana complex, which is able to cause all types of leishmaniasis infections. The L. amazonensis reference strain MHOM/BR/1973/M2269 was sequenced identifying 8,802 codifying sequences (CDS), most of them of hypothetical function. Comparative analysis using six Leishmania species showed a core set of 7,016 orthologs. L. amazonensis and Leishmania mexicana share the largest number of distinct orthologs, while Leishmania braziliensis presented the largest number of inparalogs. Additionally, phylogenomic analysis confirmed the taxonomic position for L. amazonensis within the “Mexicana complex”, reinforcing understanding of the split of New and Old World Leishmania. Potential non-homologous isofunctional enzymes (NISE) were identified between L. amazonensis and Homo sapiens that could provide new drug targets for development. PMID:25336895

  2. Porcine MYF6 gene: sequence, homology analysis, and variation in the promoter region.

    PubMed

    Wyszyńska-Koko, J; Kurył, J

    2004-01-01

    MYF6 gene codes for the bHLH transcription factor belonging to MyoD family. Its expression accompanies the processes of differentiation and maturation of myotubes during embriogenesis and continues on a relatively high level after birth, affecting the muscle phenotype. The porcine MYF6 gene was amplified and sequenced and compared with MYF6 gene sequences of other species. The amino acid sequence was deduced and an interspecies homology analysis was performed. Myf-6 protein shows a high conservation among species of 99 and 97% identity when comparing pig with cow and human, respectively, and of 93% when comparing pig with mouse and rat. The single nucleotide polymorphism (SNP) was revealed within the promoter region, which appeared to be T --> C transition recognized by a MspI restriction enzyme.

  3. JGI Plant Genomics Gene Annotation Pipeline

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shu, Shengqiang; Rokhsar, Dan; Goodstein, David

    2014-07-14

    Plant genomes vary in size and are highly complex with a high amount of repeats, genome duplication and tandem duplication. Gene encodes a wealth of information useful in studying organism and it is critical to have high quality and stable gene annotation. Thanks to advancement of sequencing technology, many plant species genomes have been sequenced and transcriptomes are also sequenced. To use these vastly large amounts of sequence data to make gene annotation or re-annotation in a timely fashion, an automatic pipeline is needed. JGI plant genomics gene annotation pipeline, called integrated gene call (IGC), is our effort toward thismore » aim with aid of a RNA-seq transcriptome assembly pipeline. It utilizes several gene predictors based on homolog peptides and transcript ORFs. See Methods for detail. Here we present genome annotation of JGI flagship green plants produced by this pipeline plus Arabidopsis and rice except for chlamy which is done by a third party. The genome annotations of these species and others are used in our gene family build pipeline and accessible via JGI Phytozome portal whose URL and front page snapshot are shown below.« less

  4. The complete genome sequence of human adenovirus 84, a highly recombinant new Human mastadenovirus D type with a unique fiber gene.

    PubMed

    Kaján, Győző L; Kajon, Adriana E; Pinto, Alexis Castillo; Bartha, Dániel; Arnberg, Niklas

    2017-10-15

    A novel human adenovirus was isolated from a pediatric case of acute respiratory disease in Panama City, Panama in 2011. The clinical isolate was initially identified as an intertypic recombinant based on hexon and fiber gene sequencing. Based on the analysis of its complete genome sequence, the novel complex recombinant Human mastadenovirus D (HAdV-D) strain was classified into a new HAdV type: HAdV-84, and it was designated Adenovirus D human/PAN/P309886/2011/84[P43H17F84]. HAdV-D types possess usually an ocular or gastrointestinal tropism, and respiratory association is scarcely reported. The virus has a novel fiber type, most closely related to, but still clearly distant from that of HAdV-36. The predicted fiber is hypothesised to bind sialic acid with lower affinity compared to HAdV-37. Bioinformatic analysis of the complete genomic sequence of HAdV-84 revealed multiple homologous recombination events and provided deeper insight into HAdV evolution. Copyright © 2017 Elsevier B.V. All rights reserved.

  5. Regions of conservation and divergence in the 3' untranslated sequences of genomic RNA from Ross River virus isolates.

    PubMed

    Faragher, S G; Dalgarno, L

    1986-07-20

    The 3' untranslated (UT) sequences of the genomic RNAs of five geographic variants of the alphavirus Ross River virus (RRV) were determined and compared with the 3' UT sequence of RRV T48, the prototype strain. Part of the 3' UT region of Getah virus, a close serological relative of RRV, was also sequenced. The RRV 3' UT region varies markedly in length between variants. Large deletions or insertions, sequence rearrangements and single nucleotide substitutions are observed. A sequence tract of 49 to 58 nucleotides, which is repeated as four blocks in the RRV T48 3' UT region, occurs only once in the 3' UT region of one RRV strain (NB5092), indicating that the existence of repeat sequence blocks is not essential for RRV replication. However, the precise sequence of the 3' proximal copy of the repeat block and its position relative to the poly(A) tail were identical in all RRV isolates examined, suggesting that it has an important role in RRV replication. Nucleotide substitutions between RRV variants are distributed non-randomly along the length of the 3' UT region. The sequence of 120 to 130 nucleotides adjacent to the poly(A) tail is strongly conserved. Getah virus RNA contains three repeat sequence blocks in the 3' UT region. These are similar in sequence to those in RRV RNA but differ in their arrangement. Homology between the RRV and Getah 3' UT sequences is greatest in the 3' proximal repeat sequence block that shows three differences in 49 nucleotides. The 3' proximal repeat in Getah RNA occurs at the same position, relative to the poly(A) tail, as in all RRV variants. The RRV and Getah virus 3' UT sequences show extensive homology in the region between the 3' proximal repeat and the poly(A) tail but, apart from the repeat blocks themselves, they show no significant homology elsewhere.

  6. Protein 8-class secondary structure prediction using conditional neural fields.

    PubMed

    Wang, Zhiyong; Zhao, Feng; Peng, Jian; Xu, Jinbo

    2011-10-01

    Compared with the protein 3-class secondary structure (SS) prediction, the 8-class prediction gains less attention and is also much more challenging, especially for proteins with few sequence homologs. This paper presents a new probabilistic method for 8-class SS prediction using conditional neural fields (CNFs), a recently invented probabilistic graphical model. This CNF method not only models the complex relationship between sequence features and SS, but also exploits the interdependency among SS types of adjacent residues. In addition to sequence profiles, our method also makes use of non-evolutionary information for SS prediction. Tested on the CB513 and RS126 data sets, our method achieves Q8 accuracy of 64.9 and 64.7%, respectively, which are much better than the SSpro8 web server (51.0 and 48.0%, respectively). Our method can also be used to predict other structure properties (e.g. solvent accessibility) of a protein or the SS of RNA. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  7. Identification of the WBSCR9 gene, encoding a novel transcriptional regulator, in the Williams-Beuren syndrome deletion at 7q11.23.

    PubMed

    Peoples, R J; Cisco, M J; Kaplan, P; Francke, U

    1998-01-01

    We have identified a novel gene (WBSCR9) within the common Williams-Beuren syndrome (WBS) deletion by interspecies sequence conservation. The WBSCR9 gene encodes a roughly 7-kb transcript with an open reading frame of 1483 amino acids and a predicted protein product size of 170.8 kDa. WBSCR9 is comprised of at least 20 exons extending over 60 kb. The transcript is expressed ubiquitously throughout development and is subject to alternative splicing. Functional motifs identified by sequence homology searches include a bromodomain; a PHD, or C4HC3, finger; several putative nuclear localization signals; four nuclear receptor binding motifs; a polyglutamate stretch and two PEST sequences. Bromodomains, PHD motifs and nuclear receptor binding motifs are cardinal features of proteins that are involved in chromatin remodeling and modulation of transcription. Haploinsufficiency for WBSCR9 gene products may contribute to the complex phenotype of WBS by interacting with tissue-specific regulatory factors during development.

  8. Development of a Reporter System to Explore MMEJ in the Context of Replacing Large Genomic Fragments.

    PubMed

    Yanik, Mert; Ponnam, Surya Prakash Goud; Wimmer, Tobias; Trimborn, Lennart; Müller, Carina; Gambert, Isabel; Ginsberg, Johanna; Janise, Annabella; Domicke, Janina; Wende, Wolfgang; Lorenz, Birgit; Stieger, Knut

    2018-06-01

    Common genome-editing strategies are either based on non-homologous end joining (NHEJ) or, in the presence of a template DNA, based on homologous recombination with long (homology-directed repair [HDR]) or short (microhomology-mediated end joining [MMEJ]) homologous sequences. In the current study, we aim to develop a model system to test the activity of MMEJ after CRISPR/Cas9-mediated cleavage in cell culture. Following successful proof of concept in an episomally based reporter system, we tested template plasmids containing a promoter-less luciferase gene flanked by microhomologous sequences (mhs) of different length (5, 10, 15, 20, 30, and 50 bp) that are complementary to the mouse retinitis pigmentosa GTPase regulator (RPGR)-ORF15, which is under the control of a CMV promoter stably integrated into a HEK293 cell line. Luciferase signal appearance represented successful recombination events and was highest when the mhs were 5 bp long, while longer mhs revealed lower luciferase signal. In addition, presence of Csy4 RNase was shown to increase luciferase signaling. The luciferase reporter system is a valuable tool to study the input of the different DNA repair mechanisms in the replacement of large DNA sequences by mhs. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.

  9. Nucleotide sequences of two genomic DNAs encoding peroxidase of Arabidopsis thaliana.

    PubMed

    Intapruk, C; Higashimura, N; Yamamoto, K; Okada, N; Shinmyo, A; Takano, M

    1991-02-15

    The peroxidase (EC 1.11.1.7)-encoding gene of Arabidopsis thaliana was screened from a genomic library using a cDNA encoding a neutral isozyme of horseradish, Armoracia rusticana, peroxidase (HRP) as a probe, and two positive clones were isolated. From the comparison with the sequences of the HRP-encoding genes, we concluded that two clones contained peroxidase-encoding genes, and they were named prxCa and prxEa. Both genes consisted of four exons and three introns; the introns had consensus nucleotides, GT and AG, at the 5' and 3' ends, respectively. The lengths of each putative exon of the prxEa gene were the same as those of the HRP-basic-isozyme-encoding gene, prxC3, and coded for 349 amino acids (aa) with a sequence homology of 89% to that encoded by prxC3. The prxCa gene was very close to the HRP-neutral-isozyme-encoding gene, prxC1b, and coded for 354 aa with 91% homology to that encoded by prxC1b. The aa sequence homology was 64% between the two peroxidases encoded by prxCa and prxEa.

  10. Decorated Heegaard Diagrams and Combinatorial Heegaard Floer Homology

    NASA Astrophysics Data System (ADS)

    Hammarsten, Carl

    Heegaard Floer homology is a collection of invariants for closed oriented three-manifolds, introduced by Ozsvath and Szabo in 2001. The simplest version is defined as the homology of a chain complex coming from a Heegaard diagram of the three manifold. In the original definition, the differentials count the number of points in certain moduli spaces of holomorphic disks, which are hard to compute in general. More recently, Sarkar and Wang (2006) and Ozsvath, Stipsicz and Szabo, (2009) have determined combinatorial methods for computing this homology with Z2 coefficients. Both methods rely on the construction of very specific Heegaard diagrams for the manifold, which are generally very complicated. Given a decorated Heegaard diagram H for a closed oriented 3-manifold Y, that is a Heegaard diagram together with a collection of embedded paths satisfying certain criteria, we describe a combinatorial recipe for a chain complex CF'[special character omitted]( H). If H satisfies some technical constraints we show that this chain complex is homotopically equivalent to the Heegaard Floer chain complex CF[special character omitted](H) and hence has the Heegaard Floer homology HF[special character omitted](Y) as its homology groups. Using branched spines we give an algorithm to construct a decorated Heegaard diagram which satisfies the necessary technical constraints for every closed oriented Y. We present this diagram graphically in the form of a strip diagram.

  11. Evidence that a sequence similar to TAR is important for induction of the JC virus late promoter by human immunodeficiency virus type 1 Tat.

    PubMed Central

    Chowdhury, M; Taylor, J P; Chang, C F; Rappaport, J; Khalili, K

    1992-01-01

    A specific RNA sequence located in the leader of all human immunodeficiency virus type 1 (HIV-1) mRNAs termed the transactivation response element, or TAR, is a primary target for induction of HIV-1 long terminal repeat activity by the HIV-1-derived trans-regulatory protein, Tat. Human neurotropic virus, JC virus (JCV), a causative agent of the degenerative demyelinating disease progressive multifocal leukoencephalopathy, contains sequences in the 5' end of the late RNA species with an extensive homology to HIV-1 TAR. In this study, we examined the possible role of the JCV-derived TAR-homologous sequence in Tat-mediated activation of the JCV late promoter (Tada et al., Proc. Natl. Acad. Sci. USA 87:3479-3483, 1990). Results from site-directed mutagenesis revealed that critical G residues required for the function of HIV-1 TAR that are conserved in the JCV TAR homolog play an important role in Tat activation of the JCV promoter. In addition, in vivo competition studies suggest that shared regulatory components mediate Tat activation of the JCV late and HIV-1 long terminal repeat promoters. Furthermore, we showed that the JCV-derived TAR sequence behaves in the same way as HIV-1 TAR in response to two distinct Tat mutants, one of which that has no ability to bind to HIV-1 TAR and another that lacks transcriptional activity on a responsive promoter. These results suggest that the TAR homolog of the JCV late promoter is responsive to HIV-1 Tat induction and thus may participate in the overall activation of the JCV late promoter mediated by this transactivation. Images PMID:1331525

  12. Spatial sexual dimorphism of X and Y homolog gene expression in the human central nervous system during early male development.

    PubMed

    Johansson, Martin M; Lundin, Elin; Qian, Xiaoyan; Mirzazadeh, Mohammadreza; Halvardson, Jonatan; Darj, Elisabeth; Feuk, Lars; Nilsson, Mats; Jazin, Elena

    2016-01-01

    Renewed attention has been directed to the functions of the Y chromosome in the central nervous system during early human male development, due to the recent proposed involvement in neurodevelopmental diseases. PCDH11Y and NLGN4Y are of special interest because they belong to gene families involved in cell fate determination and formation of dendrites and axon. We used RNA sequencing, immunocytochemistry and a padlock probing and rolling circle amplification strategy, to distinguish the expression of X and Y homologs in situ in the human brain for the first time. To minimize influence of androgens on the sex differences in the brain, we focused our investigation to human embryos at 8-11 weeks post-gestation. We found that the X- and Y-encoded genes are expressed in specific and heterogeneous cellular sub-populations of both glial and neuronal origins. More importantly, we found differential distribution patterns of X and Y homologs in the male developing central nervous system. This study has visualized the spatial distribution of PCDH11X/Y and NLGN4X/Y in human developing nervous tissue. The observed spatial distribution patterns suggest the existence of an additional layer of complexity in the development of the male CNS.

  13. Interactions among Trypanosoma brucei RAD51 paralogues in DNA repair and antigenic variation

    PubMed Central

    Dobson, Rachel; Stockdale, Christopher; Lapsley, Craig; Wilkes, Jonathan; McCulloch, Richard

    2011-01-01

    Homologous recombination in Trypanosoma brucei is used for moving variant surface glycoprotein (VSG) genes into expression sites during immune evasion by antigenic variation. A major route for such VSG switching is gene conversion reactions in which RAD51, a universally conserved recombinase, catalyses homology-directed strand exchange. In any eukaryote, RAD51-directed strand exchange in vivo is mediated by further factors, including RAD51-related proteins termed Rad51 paralogues. These appear to be ubiquitously conserved, although their detailed roles in recombination remain unclear. In T. brucei, four putative RAD51 paralogue genes have been identified by sequence homology. Here we show that all four RAD51 paralogues act in DNA repair, recombination and RAD51 subnuclear dynamics, though not equivalently, while mutation of only one RAD51 paralogue gene significantly impedes VSG switching. We also show that the T. brucei RAD51 paralogues interact, and that the complexes they form may explain the distinct phenotypes of the mutants as well as observed expression interdependency. Finally, we document the Rad51 paralogues that are encoded by a wide range of protists, demonstrating that the Rad51 paralogue repertoire in T. brucei is unusually large among microbial eukaryotes and that one member of the protein family corresponds with a key, conserved eukaryotic Rad51 paralogue. PMID:21615552

  14. Development of versatile non-homologous end joining-based knock-in module for genome editing.

    PubMed

    Sawatsubashi, Shun; Joko, Yudai; Fukumoto, Seiji; Matsumoto, Toshio; Sugano, Shigeo S

    2018-01-12

    CRISPR/Cas9-based genome editing has dramatically accelerated genome engineering. An important aspect of genome engineering is efficient knock-in technology. For improved knock-in efficiency, the non-homologous end joining (NHEJ) repair pathway has been used over the homology-dependent repair pathway, but there remains a need to reduce the complexity of the preparation of donor vectors. We developed the versatile NHEJ-based knock-in module for genome editing (VIKING). Using the consensus sequence of the time-honored pUC vector to cut donor vectors, any vector with a pUC backbone could be used as the donor vector without customization. Conditions required to minimize random integration rates of the donor vector were also investigated. We attempted to isolate null lines of the VDR gene in human HaCaT keratinocytes using knock-in/knock-out with a selection marker cassette, and found 75% of clones isolated were successfully knocked-in. Although HaCaT cells have hypotetraploid genome composition, the results suggest multiple clones have VDR null phenotypes. VIKING modules enabled highly efficient knock-in of any vectors harboring pUC vectors. Users now can insert various existing vectors into an arbitrary locus in the genome. VIKING will contribute to low-cost genome engineering.

  15. Expression of an Atriplex nummularia gene encoding a protein homologous to the bacterial molecular chaperone DnaJ.

    PubMed

    Zhu, J K; Shi, J; Bressan, R A; Hasegawa, P M

    1993-03-01

    DnaJ is a 36-kD heat shock protein that functions together with Dnak (Hsp70) as a molecular chaperone in Escherichia coli. We have obtained a cDNA clone from the higher plant Atriplex nummularia that encodes a 46.6-kD polypeptide (ANJ1) with an overall 35.2% amino acid sequence identity with the E. coli DnaJ. ANJ1 has 43.4% overall sequence identity with the Saccharomyces cerevisiae cytoplasmic DnaJ homolog YDJ1/MAS5. Complementation of the yeast mas5 mutation indicated that ANJ1 is a functional homolog of YDJ1/MAS5. The presence of other DnaJ homologs in A. nummularia was demonstrated by the detection of proteins that are antigenically related to the yeast mitochondrial DnaJ homolog SCJ1 and the yeast DnaJ-related protein Sec63. Expression of the ANJ1 gene was compared with that of an A. nummularia Hsp70 gene. Expression of both ANJ1 and Hsp70 transcripts was coordinately induced by heat shock. However, noncoordinate accumulation of ANJ1 and Hsp70 mRNAs occurred during the cell growth cycle and in response to NaCl stress.

  16. In trans paired nicking triggers seamless genome editing without double-stranded DNA cutting.

    PubMed

    Chen, Xiaoyu; Janssen, Josephine M; Liu, Jin; Maggio, Ignazio; 't Jong, Anke E J; Mikkers, Harald M M; Gonçalves, Manuel A F V

    2017-09-22

    Precise genome editing involves homologous recombination between donor DNA and chromosomal sequences subjected to double-stranded DNA breaks made by programmable nucleases. Ideally, genome editing should be efficient, specific, and accurate. However, besides constituting potential translocation-initiating lesions, double-stranded DNA breaks (targeted or otherwise) are mostly repaired through unpredictable and mutagenic non-homologous recombination processes. Here, we report that the coordinated formation of paired single-stranded DNA breaks, or nicks, at donor plasmids and chromosomal target sites by RNA-guided nucleases based on CRISPR-Cas9 components, triggers seamless homology-directed gene targeting of large genetic payloads in human cells, including pluripotent stem cells. Importantly, in addition to significantly reducing the mutagenicity of the genome modification procedure, this in trans paired nicking strategy achieves multiplexed, single-step, gene targeting, and yields higher frequencies of accurately edited cells when compared to the standard double-stranded DNA break-dependent approach.CRISPR-Cas9-based gene editing involves double-strand breaks at target sequences, which are often repaired by mutagenic non-homologous end-joining. Here the authors use Cas9 nickases to generate coordinated single-strand breaks in donor and target DNA for precise homology-directed gene editing.

  17. Comparative analysis of the prion protein gene sequences in African lion.

    PubMed

    Wu, Chang-De; Pang, Wan-Yong; Zhao, De-Ming

    2006-10-01

    The prion protein gene of African lion (Panthera Leo) was first cloned and polymorphisms screened. The results suggest that the prion protein gene of eight African lions is highly homogenous. The amino acid sequences of the prion protein (PrP) of all samples tested were identical. Four single nucleotide polymorphisms (C42T, C81A, C420T, T600C) in the prion protein gene (Prnp) of African lion were found, but no amino acid substitutions. Sequence analysis showed that the higher homology is observed to felis catus AF003087 (96.7%) and to sheep number M31313.1 (96.2%) Genbank accessed. With respect to all the mammalian prion protein sequences compared, the African lion prion protein sequence has three amino acid substitutions. The homology might in turn affect the potential intermolecular interactions critical for cross species transmission of prion disease.

  18. Analysis of the DNA sequence of a 15,500 bp fragment near the left telomere of chromosome XV from Saccharomyces cerevisiae reveals a putative sugar transporter, a carboxypeptidase homologue and two new open reading frames.

    PubMed

    Gamo, F J; Lafuente, M J; Casamayor, A; Ariño, J; Aldea, M; Casas, C; Herrero, E; Gancedo, C

    1996-06-15

    We report the sequence of a 15.5 kb DNA segment located near the left telomere of chromosome XV of Saccharomyces cerevisiae. The sequence contains nine open reading frames (ORFs) longer than 300 bp. Three of them are internal to other ones. One corresponds to the gene LGT3 that encodes a putative sugar transporter. Three adjacent ORFs were separated by two stop codons in frame. These ORFs presented homology with the gene CPS1 that encodes carboxypeptidase S. The stop codons were not found in the same sequence derived from another yeast strain. Two other ORFs without significant homology in databases were also found. One of them, O0420, is very rich in serine and threonine and presents a series of repeated or similar amino acid stretches along the sequence.

  19. Nucleotide sequence analysis of the L gene of Newcastle disease virus: homologies with Sendai and vesicular stomatitis viruses.

    PubMed Central

    Yusoff, K; Millar, N S; Chambers, P; Emmerson, P T

    1987-01-01

    The nucleotide sequence of the L gene of the Beaudette C strain of Newcastle disease virus (NDV) has been determined. The L gene is 6704 nucleotides long and encodes a protein of 2204 amino acids with a calculated molecular weight of 248822. Mung bean nuclease mapping of the 5' terminus of the L gene mRNA indicates that the transcription of the L gene is initiated 11 nucleotides upstream of the translational start site. Comparison with the amino acid sequences of the L genes of Sendai virus and vesicular stomatitis virus (VSV) suggests that there are several regions of homology between the sequences. These data provide further evidence for an evolutionary relationship between the Paramyxoviridae and the Rhabdoviridae. A non-coding sequence of 46 nucleotides downstream of the presumed polyadenylation site of the L gene may be part of a negative strand leader RNA. Images PMID:3035486

  20. Sequence and analysis of chromosome 4 of the plant Arabidopsis thaliana.

    PubMed

    Mayer, K; Schüller, C; Wambutt, R; Murphy, G; Volckaert, G; Pohl, T; Düsterhöft, A; Stiekema, W; Entian, K D; Terryn, N; Harris, B; Ansorge, W; Brandt, P; Grivell, L; Rieger, M; Weichselgartner, M; de Simone, V; Obermaier, B; Mache, R; Müller, M; Kreis, M; Delseny, M; Puigdomenech, P; Watson, M; Schmidtheini, T; Reichert, B; Portatelle, D; Perez-Alonso, M; Boutry, M; Bancroft, I; Vos, P; Hoheisel, J; Zimmermann, W; Wedler, H; Ridley, P; Langham, S A; McCullagh, B; Bilham, L; Robben, J; Van der Schueren, J; Grymonprez, B; Chuang, Y J; Vandenbussche, F; Braeken, M; Weltjens, I; Voet, M; Bastiaens, I; Aert, R; Defoor, E; Weitzenegger, T; Bothe, G; Ramsperger, U; Hilbert, H; Braun, M; Holzer, E; Brandt, A; Peters, S; van Staveren, M; Dirske, W; Mooijman, P; Klein Lankhorst, R; Rose, M; Hauf, J; Kötter, P; Berneiser, S; Hempel, S; Feldpausch, M; Lamberth, S; Van den Daele, H; De Keyser, A; Buysshaert, C; Gielen, J; Villarroel, R; De Clercq, R; Van Montagu, M; Rogers, J; Cronin, A; Quail, M; Bray-Allen, S; Clark, L; Doggett, J; Hall, S; Kay, M; Lennard, N; McLay, K; Mayes, R; Pettett, A; Rajandream, M A; Lyne, M; Benes, V; Rechmann, S; Borkova, D; Blöcker, H; Scharfe, M; Grimm, M; Löhnert, T H; Dose, S; de Haan, M; Maarse, A; Schäfer, M; Müller-Auer, S; Gabel, C; Fuchs, M; Fartmann, B; Granderath, K; Dauner, D; Herzl, A; Neumann, S; Argiriou, A; Vitale, D; Liguori, R; Piravandi, E; Massenet, O; Quigley, F; Clabauld, G; Mündlein, A; Felber, R; Schnabl, S; Hiller, R; Schmidt, W; Lecharny, A; Aubourg, S; Chefdor, F; Cooke, R; Berger, C; Montfort, A; Casacuberta, E; Gibbons, T; Weber, N; Vandenbol, M; Bargues, M; Terol, J; Torres, A; Perez-Perez, A; Purnelle, B; Bent, E; Johnson, S; Tacon, D; Jesse, T; Heijnen, L; Schwarz, S; Scholler, P; Heber, S; Francs, P; Bielke, C; Frishman, D; Haase, D; Lemcke, K; Mewes, H W; Stocker, S; Zaccaria, P; Bevan, M; Wilson, R K; de la Bastide, M; Habermann, K; Parnell, L; Dedhia, N; Gnoj, L; Schutz, K; Huang, E; Spiegel, L; Sehkon, M; Murray, J; Sheet, P; Cordes, M; Abu-Threideh, J; Stoneking, T; Kalicki, J; Graves, T; Harmon, G; Edwards, J; Latreille, P; Courtney, L; Cloud, J; Abbott, A; Scott, K; Johnson, D; Minx, P; Bentley, D; Fulton, B; Miller, N; Greco, T; Kemp, K; Kramer, J; Fulton, L; Mardis, E; Dante, M; Pepin, K; Hillier, L; Nelson, J; Spieth, J; Ryan, E; Andrews, S; Geisel, C; Layman, D; Du, H; Ali, J; Berghoff, A; Jones, K; Drone, K; Cotton, M; Joshu, C; Antonoiu, B; Zidanic, M; Strong, C; Sun, H; Lamar, B; Yordan, C; Ma, P; Zhong, J; Preston, R; Vil, D; Shekher, M; Matero, A; Shah, R; Swaby, I K; O'Shaughnessy, A; Rodriguez, M; Hoffmann, J; Till, S; Granat, S; Shohdy, N; Hasegawa, A; Hameed, A; Lodhi, M; Johnson, A; Chen, E; Marra, M; Martienssen, R; McCombie, W R

    1999-12-16

    The higher plant Arabidopsis thaliana (Arabidopsis) is an important model for identifying plant genes and determining their function. To assist biological investigations and to define chromosome structure, a coordinated effort to sequence the Arabidopsis genome was initiated in late 1996. Here we report one of the first milestones of this project, the sequence of chromosome 4. Analysis of 17.38 megabases of unique sequence, representing about 17% of the genome, reveals 3,744 protein coding genes, 81 transfer RNAs and numerous repeat elements. Heterochromatic regions surrounding the putative centromere, which has not yet been completely sequenced, are characterized by an increased frequency of a variety of repeats, new repeats, reduced recombination, lowered gene density and lowered gene expression. Roughly 60% of the predicted protein-coding genes have been functionally characterized on the basis of their homology to known genes. Many genes encode predicted proteins that are homologous to human and Caenorhabditis elegans proteins.

  1. Structural basis of toxicity and immunity in contact-dependent growth inhibition (CDI) systems.

    PubMed

    Morse, Robert P; Nikolakakis, Kiel C; Willett, Julia L E; Gerrick, Elias; Low, David A; Hayes, Christopher S; Goulding, Celia W

    2012-12-26

    Contact-dependent growth inhibition (CDI) systems encode polymorphic toxin/immunity proteins that mediate competition between neighboring bacterial cells. We present crystal structures of CDI toxin/immunity complexes from Escherichia coli EC869 and Burkholderia pseudomallei 1026b. Despite sharing little sequence identity, the toxin domains are structurally similar and have homology to endonucleases. The EC869 toxin is a Zn(2+)-dependent DNase capable of completely degrading the genomes of target cells, whereas the Bp1026b toxin cleaves the aminoacyl acceptor stems of tRNA molecules. Each immunity protein binds and inactivates its cognate toxin in a unique manner. The EC869 toxin/immunity complex is stabilized through an unusual β-augmentation interaction. In contrast, the Bp1026b immunity protein exploits shape and charge complementarity to occlude the toxin active site. These structures represent the initial glimpse into the CDI toxin/immunity network, illustrating how sequence-diverse toxins adopt convergent folds yet retain distinct binding interactions with cognate immunity proteins. Moreover, we present visual demonstration of CDI toxin delivery into a target cell.

  2. Rapid cloning of genes in hexaploid wheat using cultivar-specific long-range chromosome assembly.

    PubMed

    Thind, Anupriya Kaur; Wicker, Thomas; Šimková, Hana; Fossati, Dario; Moullet, Odile; Brabant, Cécile; Vrána, Jan; Doležel, Jaroslav; Krattinger, Simon G

    2017-08-01

    Cereal crops such as wheat and maize have large repeat-rich genomes that make cloning of individual genes challenging. Moreover, gene order and gene sequences often differ substantially between cultivars of the same crop species. A major bottleneck for gene cloning in cereals is the generation of high-quality sequence information from a cultivar of interest. In order to accelerate gene cloning from any cropping line, we report 'targeted chromosome-based cloning via long-range assembly' (TACCA). TACCA combines lossless genome-complexity reduction via chromosome flow sorting with Chicago long-range linkage to assemble complex genomes. We applied TACCA to produce a high-quality (N50 of 9.76 Mb) de novo chromosome assembly of the wheat line CH Campala Lr22a in only 4 months. Using this assembly we cloned the broad-spectrum Lr22a leaf-rust resistance gene, using molecular marker information and ethyl methanesulfonate (EMS) mutants, and found that Lr22a encodes an intracellular immune receptor homologous to the Arabidopsis thaliana RPM1 protein.

  3. External and semi-internal controls for PCR amplification of homologous sequences in mixed templates.

    PubMed

    Kalle, Elena; Gulevich, Alexander; Rensing, Christopher

    2013-11-01

    In a mixed template, the presence of homologous target DNA sequences creates environments that almost inevitably give rise to artifacts and biases during PCR. Heteroduplexes, chimeras, and skewed template-to-product ratios are the exclusive attributes of mixed template PCR and never occur in a single template assay. Yet, multi-template PCR has been used without appropriate attention to quality control and assay validation, in spite of the fact that such practice diminishes the reliability of results. External and internal amplification controls became obligatory elements of good laboratory practice in different PCR assays. We propose the inclusion of an analogous approach as a quality control system for multi-template PCR applications. The amplification controls must take into account the characteristics of multi-template PCR and be able to effectively monitor particular assay performance. This study demonstrated the efficiency of a model mixed template as an adequate external amplification control for a particular PCR application. The conditions of multi-template PCR do not allow implementation of a classic internal control; therefore we developed a convenient semi-internal control as an acceptable alternative. In order to evaluate the effects of inhibitors, a model multi-template mix was amplified in a mixture with DNAse-treated sample. Semi-internal control allowed establishment of intervals for robust PCR performance for different samples, thus enabling correct comparison of the samples. The complexity of the external and semi-internal amplification controls must be comparable with the assumed complexity of the samples. We also emphasize that amplification controls should be applied in multi-template PCR regardless of the post-assay method used to analyze products. © 2013 Elsevier B.V. All rights reserved.

  4. Transmission of the PabI family of restriction DNA glycosylase genes: mobility and long-term inheritance.

    PubMed

    Kojima, Kenji K; Kobayashi, Ichizo

    2015-10-19

    R.PabI is an exceptional restriction enzyme that functions as a DNA glycosylase. The enzyme excises an unmethylated base from its recognition sequence to generate apurinic/apyrimidinic (AP) sites, and also displays AP lyase activity, cleaving the DNA backbone at the AP site to generate the 3'-phospho alpha, beta-unsaturated aldehyde end in addition to the 5'-phosphate end. The resulting ends are difficult to religate with DNA ligase. The enzyme was originally isolated in Pyrococcus, a hyperthermophilic archaeon, and additional homologs subsequently identified in the epsilon class of the Gram-negative bacterial phylum Proteobacteria, such as Helicobacter pylori. Systematic analysis of R.PabI homologs and their neighboring genes in sequenced genomes revealed co-occurrence of R.PabI with M.PabI homolog methyltransferase genes. R.PabI and M.PabI homolog genes are occasionally found at corresponding (orthologous) loci in different species, such as Helicobacter pylori, Helicobacter acinonychis and Helicobacter cetorum, indicating long-term maintenance of the gene pair. One R.PabI and M.PabI homolog gene pair is observed immediately after the GMP synthase gene in both Campylobacter and Helicobacter, representing orthologs beyond genera. The mobility of the PabI family of restriction-modification (RM) system between genomes is evident upon comparison of genomes of sibling strains/species. Analysis of R.PabI and M.PabI homologs in H. pylori revealed an insertion of integrative and conjugative elements (ICE), and replacement with a gene of unknown function that may specify a membrane-associated toxin (hrgC). In view of the similarity of HrgC with toxins in type I toxin-antitoxin systems, we addressed the biological significance of this substitution. Our data indicate that replacement with hrgC occurred in the common ancestor of hspAmerind and hspEAsia. Subsequently, H. pylori with and without hrgC were intermixed at this locus, leading to complex distribution of hrgC in East Asia and the Americas. In Malaysia, hrgC was horizontally transferred from hspEAsia to hpAsia2 strains. The PabI family of RM system behaves as a mobile, selfish genetic element, similar to the other families of Type II RM systems. Our analysis additionally revealed some cases of long-term inheritance. The distribution of the hrgC gene replacing the PabI family in the subpopulations of H. pylori, hspAmerind, hspEAsia and hpAsia2, corresponds to the two human migration events, one from East Asia to Americas and the other from China to Malaysia.

  5. Annotating Protein Functional Residues by Coupling High-Throughput Fitness Profile and Homologous-Structure Analysis.

    PubMed

    Du, Yushen; Wu, Nicholas C; Jiang, Lin; Zhang, Tianhao; Gong, Danyang; Shu, Sara; Wu, Ting-Ting; Sun, Ren

    2016-11-01

    Identification and annotation of functional residues are fundamental questions in protein sequence analysis. Sequence and structure conservation provides valuable information to tackle these questions. It is, however, limited by the incomplete sampling of sequence space in natural evolution. Moreover, proteins often have multiple functions, with overlapping sequences that present challenges to accurate annotation of the exact functions of individual residues by conservation-based methods. Using the influenza A virus PB1 protein as an example, we developed a method to systematically identify and annotate functional residues. We used saturation mutagenesis and high-throughput sequencing to measure the replication capacity of single nucleotide mutations across the entire PB1 protein. After predicting protein stability upon mutations, we identified functional PB1 residues that are essential for viral replication. To further annotate the functional residues important to the canonical or noncanonical functions of viral RNA-dependent RNA polymerase (vRdRp), we performed a homologous-structure analysis with 16 different vRdRp structures. We achieved high sensitivity in annotating the known canonical polymerase functional residues. Moreover, we identified a cluster of noncanonical functional residues located in the loop region of the PB1 β-ribbon. We further demonstrated that these residues were important for PB1 protein nuclear import through the interaction with Ran-binding protein 5. In summary, we developed a systematic and sensitive method to identify and annotate functional residues that are not restrained by sequence conservation. Importantly, this method is generally applicable to other proteins about which homologous-structure information is available. To fully comprehend the diverse functions of a protein, it is essential to understand the functionality of individual residues. Current methods are highly dependent on evolutionary sequence conservation, which is usually limited by sampling size. Sequence conservation-based methods are further confounded by structural constraints and multifunctionality of proteins. Here we present a method that can systematically identify and annotate functional residues of a given protein. We used a high-throughput functional profiling platform to identify essential residues. Coupling it with homologous-structure comparison, we were able to annotate multiple functions of proteins. We demonstrated the method with the PB1 protein of influenza A virus and identified novel functional residues in addition to its canonical function as an RNA-dependent RNA polymerase. Not limited to virology, this method is generally applicable to other proteins that can be functionally selected and about which homologous-structure information is available. Copyright © 2016 Du et al.

  6. Chromosomal localization of three repair genes: The xeroderma pigmentosum group C gene and two human homologs of yeast RAD23

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Spek, P.J. van der; Smit, E.M.E.; Beverloo, H.B.

    1994-10-01

    The nucleotide excision repair (NER) disorder xeroderma pigmentosum (XP) is characterized by sun (UV) sensitivity, predisposition to skin cancer, and extensive genetic heterogeneity. Recently, we reported the cloning and analysis of three human NER genes, XPC, HHR23A, and HHR23B. The previously cloned XPC gene is involved in the common XP complementation group C, which is defective in excision repair of nontranscribed sequences in the genome. The XPC protein was found to be complexed with the product of HHR23B, one of the two human homologs of the Saccharomyes cerevisiae NER gene RAD23. Here we present the chromosomal localization by in situmore » hybridization using haptenized probes of all three genes. The HHR23A gene was assigned to chromosome 19p13.2. Interestingly, the HHR23B and XPC genes, the product of which forms a tight complex, were found to colocalize on band 3p25.1. Pulsed-field gel electrophoresis revealed that the HHR23B and XPC genes possibly share a MluI restriction fragment of about 625 kb. Potential involvement of the HHR23 genes in human genetic disorders is discussed. 53 refs., 4 figs., 2 tabs.« less

  7. Predictive Bcl-2 Family Binding Models Rooted in Experiment or Structure

    PubMed Central

    DeBartolo, Joe; Dutta, Sanjib; Reich, Lothar; Keating, Amy E.

    2013-01-01

    Proteins of the Bcl-2 family either enhance or suppress programmed cell death and are centrally involved in cancer development and resistance to chemotherapy. BH3 (Bcl-2 homology 3)-only Bcl-2 proteins promote cell death by docking an α-helix into a hydrophobic groove on the surface of one or more of five pro-survival Bcl-2 receptor proteins. There is high structural homology within the pro-death and pro-survival families, yet a high degree of interaction specificity is nevertheless encoded, posing an interesting and important molecular recognition problem. Understanding protein features that dictate Bcl-2 interaction specificity is critical for designing peptide-based cancer therapeutics and diagnostics. In this study, we present peptide SPOT arrays and deep sequencing data from yeast display screening experiments that significantly expand the BH3 sequence space that has been experimentally tested for interaction with five human anti-apoptotic receptors. These data provide rich information about the determinants of Bcl-2 family specificity. To interpret and use the information, we constructed two simple data-based models that can predict affinity and specificity when evaluated on independent data sets within a limited sequence space. We also constructed a novel structure-based statistical potential, called STATIUM, which is remarkably good at predicting Bcl-2 affinity and specificity, especially considering it is not trained on experimental data. We compare the performance of our three models to each other and to alternative structure-based methods and discuss how such tools can guide prediction and design of new Bcl-2 family complexes. PMID:22617328

  8. PMS2 inactivation by a complex rearrangement involving an HERV retroelement and the inverted 100-kb duplicon on 7p22.1.

    PubMed

    Vogt, Julia; Wernstedt, Annekatrin; Ripperger, Tim; Pabst, Brigitte; Zschocke, Johannes; Kratz, Christian; Wimmer, Katharina

    2016-11-01

    Biallelic PMS2 mutations are responsible for more than half of all cases of constitutional mismatch repair deficiency (CMMRD), a recessively inherited childhood cancer predisposition syndrome. The mismatch repair gene PMS2 is partly embedded within one copy of an inverted 100-kb low-copy repeat (LCR) on 7p22.1. In an individual with CMMRD syndrome, PMS2 was found to be homozygously inactivated by a complex chromosomal rearrangement, which separates the 5'-part from the 3'-part of the gene. The rearrangement involves sequences of the inverted 100-kb LCR and a human endogenous retrovirus element and may be associated with an inversion that is indistinguishable from the known inversion polymorphism affecting the ~0.7-Mb sequence intervening the LCR. Its formation is best explained by a replication-based mechanism (RBM) such as fork stalling and template switching/microhomology-mediated break-induced replication (FoSTeS/MMBIR). This finding supports the hypothesis that the inverted LCR can not only facilitate the formation of the non-allelic homologous recombination-mediated inversion polymorphism but it also promotes the occurrence of more complex rearrangements that can be associated with a large inversion, as well, but are mediated by a RBM. This further suggests that among the inversion polymorphism on 7p22.1, more complex rearrangements might be hidden. Furthermore, as the locus is embedded in a common fragile site (CFS) region, this rearrangement also supports the recently raised hypothesis that CFS sequence motifs may facilitate replication-based rearrangement mechanisms.

  9. PMS2 inactivation by a complex rearrangement involving an HERV retroelement and the inverted 100-kb duplicon on 7p22.1

    PubMed Central

    Vogt, Julia; Wernstedt, Annekatrin; Ripperger, Tim; Pabst, Brigitte; Zschocke, Johannes; Kratz, Christian; Wimmer, Katharina

    2016-01-01

    Biallelic PMS2 mutations are responsible for more than half of all cases of constitutional mismatch repair deficiency (CMMRD), a recessively inherited childhood cancer predisposition syndrome. The mismatch repair gene PMS2 is partly embedded within one copy of an inverted 100-kb low-copy repeat (LCR) on 7p22.1. In an individual with CMMRD syndrome, PMS2 was found to be homozygously inactivated by a complex chromosomal rearrangement, which separates the 5′-part from the 3′-part of the gene. The rearrangement involves sequences of the inverted 100-kb LCR and a human endogenous retrovirus element and may be associated with an inversion that is indistinguishable from the known inversion polymorphism affecting the ~0.7-Mb sequence intervening the LCR. Its formation is best explained by a replication-based mechanism (RBM) such as fork stalling and template switching/microhomology-mediated break-induced replication (FoSTeS/MMBIR). This finding supports the hypothesis that the inverted LCR can not only facilitate the formation of the non-allelic homologous recombination-mediated inversion polymorphism but it also promotes the occurrence of more complex rearrangements that can be associated with a large inversion, as well, but are mediated by a RBM. This further suggests that among the inversion polymorphism on 7p22.1, more complex rearrangements might be hidden. Furthermore, as the locus is embedded in a common fragile site (CFS) region, this rearrangement also supports the recently raised hypothesis that CFS sequence motifs may facilitate replication-based rearrangement mechanisms. PMID:27329736

  10. Repertoire, genealogy and genomic organization of cruzipain and homologous genes in Trypanosoma cruzi, T. cruzi-like and other trypanosome species.

    PubMed

    Lima, Luciana; Ortiz, Paola A; da Silva, Flávia Maia; Alves, João Marcelo P; Serrano, Myrna G; Cortez, Alane P; Alfieri, Silvia C; Buck, Gregory A; Teixeira, Marta M G

    2012-01-01

    Trypanosoma cruzi, the agent of Chagas disease, is a complex of genetically diverse isolates highly phylogenetically related to T. cruzi-like species, Trypanosoma cruzi marinkellei and Trypanosoma dionisii, all sharing morphology of blood and culture forms and development within cells. However, they differ in hosts, vectors and pathogenicity: T. cruzi is a human pathogen infective to virtually all mammals whilst the other two species are non-pathogenic and bat restricted. Previous studies suggest that variations in expression levels and genetic diversity of cruzipain, the major isoform of cathepsin L-like (CATL) enzymes of T. cruzi, correlate with levels of cellular invasion, differentiation, virulence and pathogenicity of distinct strains. In this study, we compared 80 sequences of genes encoding cruzipain from 25 T. cruzi isolates representative of all discrete typing units (DTUs TcI-TcVI) and the new genotype Tcbat and 10 sequences of homologous genes from other species. The catalytic domain repertoires diverged according to DTUs and trypanosome species. Relatively homogeneous sequences are found within and among isolates of the same DTU except TcV and TcVI, which displayed sequences unique or identical to those of TcII and TcIII, supporting their origin from the hybridization between these two DTUs. In network genealogies, sequences from T. cruzi clustered tightly together and closer to T. c. marinkellei than to T. dionisii and largely differed from homologues of T. rangeli and T. b. brucei. Here, analysis of isolates representative of the overall biological and genetic diversity of T. cruzi and closest T. cruzi-like species evidenced DTU- and species-specific polymorphisms corroborating phylogenetic relationships inferred with other genes. Comparison of both phylogenetically close and distant trypanosomes is valuable to understand host-parasite interactions, virulence and pathogenicity. Our findings corroborate cruzipain as valuable target for drugs, vaccine, diagnostic and genotyping approaches.

  11. New acute transforming feline retovirus with fms homology specifies a C-terminally truncated version of the c-fms protein that is different from SM-feline sarcoma virus v-fms protein

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Besmer, P.; Lader, E.; George, P.C.

    1986-10-01

    The HZ5-feline sarcoma virus (FeSV) is a new acute transforming feline retrovirus which was isolated from a multicentric fibrosarcoma of a domestic cat. The HZ5-FeSV transforms fibroblasts in vitro and is replication defective. A biologically active integrated HZ5-FeSV provirus was molecularly cloned from cellular DNA of HZ5-FeSV-infected FRE-3A rat cells. The HZ5-FeSV has oncogene homology with the fms sequences of the SM-FeSV. The genome organization of the 8.6-kilobase HZ5-FeSV provirus is 5' ..delta..gag-fms-..delta..pol-..delta..env 3'. The HZ5- and SM-FeSVs display indistinguishable in vitro transformation characteristics, and the structures of the gag-fms transforming genes in the two viruses are very similar. Inmore » the HZ5-FeSV and the SM-FeSV, identical c-fms and feline leukemia virus p10 sequences form the 5' gag-fms junction. With regard to v-fms the two viruses are homologous up to 11 amino acids before the C terminus of the SM-FeSV v-fms protein. In HZ5-FeSV a segment of 362 nucleotides then follows before the 3' recombination site with feline leukemia virus pol. The new 3' v-fms sequence encodes 27 amino acids before reaching a TGA termination signal. The relationship of this sequence with the recently characterized human c-fms sequence has been examined. The 3' HZ5-FeSV v-fms sequence is homologous with 3' c-fms sequences. A frameshift mutation (11-base-pair deletion) was found in the C-terminal fms coding sequence of the HZ5-FeSV. As a result, the HZ5-FeSV v-fms protein is predicted to be a C-terminally truncated version of c-fms. This frameshift mutation may determine the oncogenic properties of v-fms in the HZ5-FeSV.« less

  12. A new acute transforming feline retrovirus with fms homology specifies a C-terminally truncated version of the c-fms protein that is different from SM-feline sarcoma virus v-fms protein.

    PubMed Central

    Besmer, P; Lader, E; George, P C; Bergold, P J; Qiu, F H; Zuckerman, E E; Hardy, W D

    1986-01-01

    The HZ5-feline sarcoma virus (FeSV) is a new acute transforming feline retrovirus which was isolated from a multicentric fibrosarcoma of a domestic cat. The HZ5-FeSV transforms fibroblasts in vitro and is replication defective. A biologically active integrated HZ5-FeSV provirus was molecularly cloned from cellular DNA of HZ5-FeSV-infected FRE-3A rat cells. The HZ5-FeSV has oncogene homology with the fms sequences of the SM-FeSV. The genome organization of the 8.6-kilobase HZ5-FeSV provirus is 5' delta gag-fms-delta pol-delta env 3'. The HZ5-and SM-FeSVs display indistinguishable in vitro transformation characteristics, and the structures of the gag-fms transforming genes in the two viruses are very similar. In the HZ5-FeSV and the SM-FeSV, identical c-fms and feline leukemia virus p10 sequences form the 5' gag-fms junction. With regard to v-fms the two viruses are homologous up to 11 amino acids before the C terminus of the SM-FeSV v-fms protein. In HZ5-FeSV a segment of 362 nucleotides then follows before the 3' recombination site with feline leukemia virus pol. The new 3' v-fms sequence encodes 27 amino acids before reaching a TGA termination signal. The relationship of this sequence with the recently characterized human c-fms sequence has been examined. The 3' HZ5-FeSV v-fms sequence is homologous with 3' c-fms sequences. A frameshift mutation (11-base-pair deletion) was found in the C-terminal fms coding sequence of the HZ5-FeSV. As a result, the HZ5-FeSV v-fms protein is predicted to be a C-terminally truncated version of c-fms. This frameshift mutation may determine the oncogenic properties of v-fms in the HZ5-FeSV. Images PMID:3018286

  13. A Bioinformatics Classifier and Database for Heme-Copper Oxygen Reductases

    PubMed Central

    Sousa, Filipa L.; Alves, Renato J.; Pereira-Leal, José B.; Teixeira, Miguel; Pereira, Manuela M.

    2011-01-01

    Background Heme-copper oxygen reductases (HCOs) are the last enzymatic complexes of most aerobic respiratory chains, reducing dioxygen to water and translocating up to four protons across the inner mitochondrial membrane (eukaryotes) or cytoplasmatic membrane (prokaryotes). The number of completely sequenced genomes is expanding exponentially, and concomitantly, the number and taxonomic distribution of HCO sequences. These enzymes were initially classified into three different types being this classification recently challenged. Methodology We reanalyzed the classification scheme and developed a new bioinformatics classifier for the HCO and Nitric oxide reductases (NOR), which we benchmark against a manually derived gold standard sequence set. It is able to classify any given sequence of subunit I from HCO and NOR with a global recall and precision both of 99.8%. We use this tool to classify this protein family in 552 completely sequenced genomes. Conclusions We concluded that the new and broader data set supports three functional and evolutionary groups of HCOs. Homology between NORs and HCOs is shown and NORs closest relationship with C Type HCOs demonstrated. We established and made available a classification web tool and an integrated Heme-Copper Oxygen reductase and NOR protein database (www.evocell.org/hco). PMID:21559461

  14. Sequence of the bchG gene from Chloroflexus aurantiacus: relationship between chlorophyll synthase and other polyprenyltransferases

    NASA Technical Reports Server (NTRS)

    Lopez, J. C.; Ryan, S.; Blankenship, R. E.

    1996-01-01

    The sequence of the Chloroflexus aurantiacus open reading frame thought to be the C. aurantiacus homolog of the Rhodobacter capsulatus bchG gene is reported. The BchG gene product catalyzes esterification of bacteriochlorophyllide a by geranylgeraniol-PPi during bacteriochlorophyll a biosynthesis. Homologs from Arabidopsis thaliana, Synechocystis sp. strain PCC6803, and C. aurantiacus were identified in database searches. Profile analysis identified three related polyprenyltransferase enzymes which attach an aliphatic alcohol PPi to an aromatic substrate. This suggests a broader relationship between chlorophyll synthases and other polyprenyltransferases.

  15. The roles of WRN and BLM RecQ helicases in the Alternative Lengthening of Telomeres

    PubMed Central

    Mendez-Bermudez, Aaron; Hidalgo-Bravo, Alberto; Cotton, Victoria E.; Gravani, Athanasia; Jeyapalan, Jennie N.; Royle, Nicola J.

    2012-01-01

    Approximately 10% of all cancers, but a higher proportion of sarcomas, use the recombination-based alternative lengthening of telomeres (ALT) to maintain telomeres. Two RecQ helicase genes, BLM and WRN, play important roles in homologous recombination repair and they have been implicated in telomeric recombination activity, but their precise roles in ALT are unclear. Using analysis of sequence variation present in human telomeres, we found that a WRN– ALT+ cell line lacks the class of complex telomere mutations attributed to inter-telomeric recombination in other ALT+ cell lines. This suggests that WRN facilitates inter-telomeric recombination when there are sequence differences between the donor and recipient molecules or that sister-telomere interactions are suppressed in the presence of WRN and this promotes inter-telomeric recombination. Depleting BLM in the WRN– ALT+ cell line increased the mutation frequency at telomeres and at the MS32 minisatellite, which is a marker of ALT. The absence of complex telomere mutations persisted in BLM-depleted clones, and there was a clear increase in sequence homogenization across the telomere and MS32 repeat arrays. These data indicate that BLM suppresses unequal sister chromatid interactions that result in excessive homogenization at MS32 and at telomeres in ALT+ cells. PMID:22989712

  16. The roles of WRN and BLM RecQ helicases in the Alternative Lengthening of Telomeres.

    PubMed

    Mendez-Bermudez, Aaron; Hidalgo-Bravo, Alberto; Cotton, Victoria E; Gravani, Athanasia; Jeyapalan, Jennie N; Royle, Nicola J

    2012-11-01

    Approximately 10% of all cancers, but a higher proportion of sarcomas, use the recombination-based alternative lengthening of telomeres (ALT) to maintain telomeres. Two RecQ helicase genes, BLM and WRN, play important roles in homologous recombination repair and they have been implicated in telomeric recombination activity, but their precise roles in ALT are unclear. Using analysis of sequence variation present in human telomeres, we found that a WRN- ALT+ cell line lacks the class of complex telomere mutations attributed to inter-telomeric recombination in other ALT+ cell lines. This suggests that WRN facilitates inter-telomeric recombination when there are sequence differences between the donor and recipient molecules or that sister-telomere interactions are suppressed in the presence of WRN and this promotes inter-telomeric recombination. Depleting BLM in the WRN- ALT+ cell line increased the mutation frequency at telomeres and at the MS32 minisatellite, which is a marker of ALT. The absence of complex telomere mutations persisted in BLM-depleted clones, and there was a clear increase in sequence homogenization across the telomere and MS32 repeat arrays. These data indicate that BLM suppresses unequal sister chromatid interactions that result in excessive homogenization at MS32 and at telomeres in ALT+ cells.

  17. Conserved Sequences at the Origin of Adenovirus DNA Replication

    PubMed Central

    Stillman, Bruce W.; Topp, William C.; Engler, Jeffrey A.

    1982-01-01

    The origin of adenovirus DNA replication lies within an inverted sequence repetition at either end of the linear, double-stranded viral DNA. Initiation of DNA replication is primed by a deoxynucleoside that is covalently linked to a protein, which remains bound to the newly synthesized DNA. We demonstrate that virion-derived DNA-protein complexes from five human adenovirus serological subgroups (A to E) can act as a template for both the initiation and the elongation of DNA replication in vitro, using nuclear extracts from adenovirus type 2 (Ad2)-infected HeLa cells. The heterologous template DNA-protein complexes were not as active as the homologous Ad2 DNA, most probably due to inefficient initiation by Ad2 replication factors. In an attempt to identify common features which may permit this replication, we have also sequenced the inverted terminal repeated DNA from human adenovirus serotypes Ad4 (group E), Ad9 and Ad10 (group D), and Ad31 (group A), and we have compared these to previously determined sequences from Ad2 and Ad5 (group C), Ad7 (group B), and Ad12 and Ad18 (group A) DNA. In all cases, the sequence around the origin of DNA replication can be divided into two structural domains: a proximal A · T-rich region which is partially conserved among these serotypes, and a distal G · C-rich region which is less well conserved. The G · C-rich region contains sequences similar to sequences present in papovavirus replication origins. The two domains may reflect a dual mechanism for initiation of DNA replication: adenovirus-specific protein priming of replication, and subsequent utilization of this primer by host replication factors for completion of DNA synthesis. Images PMID:7143575

  18. Nucleotide Sequence and Genetic Structure of a Novel Carbaryl Hydrolase Gene (cehA) from Rhizobium sp. Strain AC100

    PubMed Central

    Hashimoto, Masayuki; Fukui, Mitsuru; Hayano, Kouichi; Hayatsu, Masahito

    2002-01-01

    Rhizobium sp. strain AC100, which is capable of degrading carbaryl (1-naphthyl-N-methylcarbamate), was isolated from soil treated with carbaryl. This bacterium hydrolyzed carbaryl to 1-naphthol and methylamine. Carbaryl hydrolase from the strain was purified to homogeneity, and its N-terminal sequence, molecular mass (82 kDa), and enzymatic properties were determined. The purified enzyme hydrolyzed 1-naphthyl acetate and 4-nitrophenyl acetate indicating that the enzyme is an esterase. We then cloned the carbaryl hydrolase gene (cehA) from the plasmid DNA of the strain and determined the nucleotide sequence of the 10-kb region containing cehA. No homologous sequences were found by a database homology search using the nucleotide and deduced amino acid sequences of the cehA gene. Six open reading frames including the cehA gene were found in the 10-kb region, and sequencing analysis shows that the cehA gene is flanked by two copies of insertion sequence-like sequence, suggesting that it makes part of a composite transposon. PMID:11872471

  19. In vivo gene correction with targeted sequence substitution through microhomology-mediated end joining.

    PubMed

    Shin, Jeong Hong; Jung, Soobin; Ramakrishna, Suresh; Kim, Hyongbum Henry; Lee, Junwon

    2018-07-07

    Genome editing technology using programmable nucleases has rapidly evolved in recent years. The primary mechanism to achieve precise integration of a transgene is mainly based on homology-directed repair (HDR). However, an HDR-based genome-editing approach is less efficient than non-homologous end-joining (NHEJ). Recently, a microhomology-mediated end-joining (MMEJ)-based transgene integration approach was developed, showing feasibility both in vitro and in vivo. We expanded this method to achieve targeted sequence substitution (TSS) of mutated sequences with normal sequences using double-guide RNAs (gRNAs), and a donor template flanking the microhomologies and target sequence of the gRNAs in vitro and in vivo. Our method could realize more efficient sequence substitution than the HDR-based method in vitro using a reporter cell line, and led to the survival of a hereditary tyrosinemia mouse model in vivo. The proposed MMEJ-based TSS approach could provide a novel therapeutic strategy, in addition to HDR, to achieve gene correction from a mutated sequence to a normal sequence. Copyright © 2018 Elsevier Inc. All rights reserved.

  20. A ribosomal orphon sequence from Xenopus laevis flanked by novel low copy number repetitive elements.

    PubMed

    Guimond, A; Moss, T

    1999-02-01

    We have used a differential cloning approach to isolate ribosomal/non-ribosomal frontier sequences from Xenopus laevis. A ribosomal intergenic spacer sequence (IGS) was cloned and shown not to be physically linked with the ribosomal locus. This ribosomal orphon contained the IGS sequences found immediately downstream of the 28S gene and included an array of enhancer repetitions and a non-functional spacer promoter. The orphon sequence was flanked by a member of the novel 'Frt' low copy repetitive element family. Three individual Frt repeats were sequenced and all members of this family were shown to lie clustered at two chromosomal sites, one of which contained the ribosomal orphon. One of the Frt elements contained an insertion of 297 bp that showed extensive homology to sequences within at least three other Xenopus genes. Each homology region was flanked by members of the T2 family of short interspersed repetitive elements, (SINEs), and by its target insertion sequence, suggesting multiple translocation events. The data are discussed in terms of the evolution of the ribosomal gene locus.

  1. ATP hydrolysis provides functions that promote rejection of pairings between different copies of long repeated sequences

    PubMed Central

    Danilowicz, Claudia; Hermans, Laura; Coljee, Vincent; Prévost, Chantal

    2017-01-01

    Abstract During DNA recombination and repair, RecA family proteins must promote rapid joining of homologous DNA. Repeated sequences with >100 base pair lengths occupy more than 1% of bacterial genomes; however, commitment to strand exchange was believed to occur after testing ∼20–30 bp. If that were true, pairings between different copies of long repeated sequences would usually become irreversible. Our experiments reveal that in the presence of ATP hydrolysis even 75 bp sequence-matched strand exchange products remain quite reversible. Experiments also indicate that when ATP hydrolysis is present, flanking heterologous dsDNA regions increase the reversibility of sequence matched strand exchange products with lengths up to ∼75 bp. Results of molecular dynamics simulations provide insight into how ATP hydrolysis destabilizes strand exchange products. These results inspired a model that shows how pairings between long repeated sequences could be efficiently rejected even though most homologous pairings form irreversible products. PMID:28854739

  2. Isolation and characterization of an AGAMOUS homolog from Fraxinus pennsylvanica

    Treesearch

    Ningxia Du; Paula M. Pijut

    2010-01-01

    An AGAMOUS homolog (FpAG) was isolated from green ash (Fraxinus pennsylvanica) using a reverse transcriptase polymerase chain reaction method. Southern blot analysis indicated that FpAG was present as a single-copy sequence in the genome of green ash. RNA accumulated in the reproductive tissues (female...

  3. Decompositions of the polyhedral product functor with applications to moment-angle complexes and related spaces

    PubMed Central

    Bahri, A.; Bendersky, M.; Cohen, F. R.; Gitler, S.

    2009-01-01

    This article gives a natural decomposition of the suspension of a generalized moment-angle complex or partial product space which arises as the polyhedral product functor described below. The introduction and application of the smash product moment-angle complex provides a precise identification of the stable homotopy type of the values of the polyhedral product functor. One direct consequence is an analysis of the associated cohomology. For the special case of the complements of certain subspace arrangements, the geometrical decomposition implies the homological decomposition in earlier work of others as described below. Because the splitting is geometric, an analogous homological decomposition for a generalized moment-angle complex applies for any homology theory. Implied, therefore, is a decomposition for the Stanley–Reisner ring of a finite simplicial complex, and natural generalizations. PMID:19620727

  4. Decompositions of the polyhedral product functor with applications to moment-angle complexes and related spaces.

    PubMed

    Bahri, A; Bendersky, M; Cohen, F R; Gitler, S

    2009-07-28

    This article gives a natural decomposition of the suspension of a generalized moment-angle complex or partial product space which arises as the polyhedral product functor described below. The introduction and application of the smash product moment-angle complex provides a precise identification of the stable homotopy type of the values of the polyhedral product functor. One direct consequence is an analysis of the associated cohomology. For the special case of the complements of certain subspace arrangements, the geometrical decomposition implies the homological decomposition in earlier work of others as described below. Because the splitting is geometric, an analogous homological decomposition for a generalized moment-angle complex applies for any homology theory. Implied, therefore, is a decomposition for the Stanley-Reisner ring of a finite simplicial complex, and natural generalizations.

  5. Nucleotide sequence of the L1 ribosomal protein gene of Xenopus laevis: remarkable sequence homology among introns.

    PubMed Central

    Loreni, F; Ruberti, I; Bozzoni, I; Pierandrei-Amaldi, P; Amaldi, F

    1985-01-01

    Ribosomal protein L1 is encoded by two genes in Xenopus laevis. The comparison of two cDNA sequences shows that the two L1 gene copies (L1a and L1b) have diverged in many silent sites and very few substitution sites; moreover a small duplication occurred at the very end of the coding region of the L1b gene which thus codes for a product five amino acids longer than that coded by L1a. Quantitatively the divergence between the two L1 genes confirms that a whole genome duplication took place in Xenopus laevis approximately 30 million years ago. A genomic fragment containing one of the two L1 gene copies (L1a), with its nine introns and flanking regions, has been completely sequenced. The 5' end of this gene has been mapped within a 20-pyridimine stretch as already found for other vertebrate ribosomal protein genes. Four of the nine introns have a 60-nucleotide sequence with 80% homology; within this region some boxes, one of which is 16 nucleotides long, are 100% homologous among the four introns. This feature of L1a gene introns is interesting since we have previously shown that the activity of this gene is regulated at a post-transcriptional level and it involves the block of the normal splicing of some intron sequences. Images Fig. 3. Fig. 5. PMID:3841512

  6. Homologous prominence non-radial eruptions: A case study

    NASA Astrophysics Data System (ADS)

    Duchlev, P.; Koleva, K.; Madjarska, M. S.; Dechev, M.

    2016-10-01

    The present study provides important details on homologous eruptions of a solar prominence that occurred in active region NOAA 10904 on 2006 August 22. We report on the pre-eruptive phase of the homologous feature as well as the kinematics and the morphology of a forth from a series of prominence eruptions that is critical in defining the nature of the previous consecutive eruptions. The evolution of the overlying coronal field during homologous eruptions is discussed and a new observational criterion for homologous eruptions is provided. We find a distinctive sequence of three activation periods each of them containing pre-eruptive precursors such as a brightening and enlarging of the prominence body followed by small surge-like ejections from its southern end observed in the radio 17 GHz. We analyse a fourth eruption that clearly indicates a full reformation of the prominence after the third eruption. The fourth eruption although occurring 11 h later has an identical morphology, the same angle of propagation with respect to the radial direction, as well as similar kinematic evolution as the previous three eruptions. We find an important feature of the homologous eruptive prominence sequence that is the maximum height increase of each consecutive eruption. The present analysis establishes that all four eruptions observed in Hα are of confined type with the third eruption undergoing a thermal disappearance during its eruptive phase. We suggest that the observation of the same direction of the magnetic flux rope (MFR) ejections can be consider as an additional observational criterion for MFR homology. This observational indication for homologous eruptions is important, especially in the case of events of typical or poorly distinguishable morphology of eruptive solar phenomena.

  7. dbSWEET: An Integrated Resource for SWEET Superfamily to Understand, Analyze and Predict the Function of Sugar Transporters in Prokaryotes and Eukaryotes.

    PubMed

    Gupta, Ankita; Sankararamakrishnan, Ramasubbu

    2018-04-14

    SWEET (Sweet Will Eventually be Exported Transporter) proteins have been recently discovered and form one of the three major families of sugar transporters. Homologs of SWEET are found in both prokaryotes and eukaryotes. Bacterial SWEET homologs have three transmembrane segments forming a triple-helical bundle and the functional form is dimers. Eukaryotic SWEETs have seven transmembrane helical segments forming two triple-helical bundles with a linker helix. Members of SWEET homologs have been shown to be involved in several important physiological processes in plants. However, not much is known regarding the biological significance of SWEET homologs in prokaryotes and in mammals. We have collected more than 2000 SWEET homologs from both prokaryotes and eukaryotes. For each homolog, we have modeled three different conformational states representing outward open, inward open and occluded states. We have provided details regarding substrate-interacting residues and residues forming the selectivity filter for each SWEET homolog. Several search and analysis options are available. The users can generate a phylogenetic tree and structure-based sequence alignment for selected set of sequences. With no metazoan SWEETs functionally characterized, the features observed in the selectivity filter residues can be used to predict the potential substrates that are likely to be transported across the metazoan SWEETs. We believe that this database will help the researchers to design mutational experiments and simulation studies that will aid to advance our understanding of the physiological role of SWEET homologs. This database is freely available to the scientific community at http://bioinfo.iitk.ac.in/bioinfo/dbSWEET/Home. Copyright © 2018 Elsevier Ltd. All rights reserved.

  8. Hidden Markov models-based system (HMMSPECTR) for detecting structural homologies on the basis of sequential information.

    PubMed

    Tsigelny, Igor; Sharikov, Yuriy; Ten Eyck, Lynn F

    2002-05-01

    HMMSPECTR is a tool for finding putative structural homologs for proteins with known primary sequences. HMMSPECTR contains four major components: a data warehouse with the hidden Markov models (HMM) and alignment libraries; a search program which compares the initial protein sequences with the libraries of HMMs; a secondary structure prediction and comparison program; and a dominant protein selection program that prepares the set of 10-15 "best" proteins from the chosen HMMs. The data warehouse contains four libraries of HMMs. The first two libraries were constructed using different HHM preparation options of the HAMMER program. The third library contains parts ("partial HMM") of initial alignments. The fourth library contains trained HMMs. We tested our program against all of the protein targets proposed in the CASP4 competition. The data warehouse included libraries of structural alignments and HMMs constructed on the basis of proteins publicly available in the Protein Data Bank before the CASP4 meeting. The newest fully automated versions of HMMSPECTR 1.02 and 1.02ss produced better results than the best result reported at CASP4 either by r.m.s.d. or by length (or both) in 64% (HMMSPECTR 1.02) and 79% (HMMSPECTR 1.02ss) of the cases. The improvement is most notable for the targets with complexity 4 (difficult fold recognition cases).

  9. De Novo Transcriptome Analysis of Allium cepa L. (Onion) Bulb to Identify Allergens and Epitopes.

    PubMed

    Rajkumar, Hemalatha; Ramagoni, Ramesh Kumar; Anchoju, Vijayendra Chary; Vankudavath, Raju Naik; Syed, Arshi Uz Zaman

    2015-01-01

    Allium cepa (onion) is a diploid plant with one of the largest nuclear genomes among all diploids. Onion is an example of an under-researched crop which has a complex heterozygous genome. There are no allergenic proteins and genomic data available for onions. This study was conducted to establish a transcriptome catalogue of onion bulb that will enable us to study onion related genes involved in medicinal use and allergies. Transcriptome dataset generated from onion bulb using the Illumina HiSeq 2000 technology showed a total of 99,074,309 high quality raw reads (~20 Gb). Based on sequence homology onion genes were categorized into 49 different functional groups. Most of the genes however, were classified under 'unknown' in all three gene ontology categories. Of the categorized genes, 61.2% showed metabolic functions followed by cellular components such as binding, cellular processes; catalytic activity and cell part. With BLASTx top hit analysis, a total of 2,511 homologous allergenic sequences were found, which had 37-100% similarity with 46 different types of allergens existing in the database. From the 46 contigs or allergens, 521 B-cell linear epitopes were identified using BepiPred linear epitope prediction tool. This is the first comprehensive insight into the transcriptome of onion bulb tissue using the NGS technology, which can be used to map IgE epitopes and prediction of structures and functions of various proteins.

  10. cDNA cloning of the human peroxisomal enoyl-CoA hydratase: 3-Hydroxyacyl-CoA dehydrogenase bifunctional enzyme and localization to chromosome 3q26. 3-3q28: A free left Alu arm is inserted in the 3[prime] noncoding region

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hoefler, G.; Forstner, M.; Hulla, W.

    1994-01-01

    Enoyl-CoA hydratase:3-hydroxyacyl-CoA dehydrogenase bifunctional enzyme is one of the four enzymes of the peroxisomal, [beta]-oxidation pathway. Here, the authors report the full-length human cDNA sequence and the localization of the corresponding gene on chromosome 3q26.3-3q28. The cDNA sequence spans 3779 nucleotides with an open reading frame of 2169 nucleotides. The tripeptide SKL at the carboxy terminus, known to serve as a peroxisomal targeting signal, is present. DNA sequence comparison of the coding region showed an 80% homology between human and rat bifunctional enzyme cDNA. The 3[prime] noncoding sequence contains 117 nucleotides homologous to an Alu repeat. Based on sequence comparison,more » they propose that these nucleotides are a free left Alu arm with 86% homology to the Alu-J family. RNA analysis shows one band with highest intensity in liver and kidney. This cDNA will allow in-depth studies of molecular defects in patients with defective peroxisomal bifunctional enzyme. Moreover, it will also provide a means for studying the regulation of peroxisomal [beta]-oxidation in humans. 33 refs., 5 figs.« less

  11. SANSparallel: interactive homology search against Uniprot.

    PubMed

    Somervuo, Panu; Holm, Liisa

    2015-07-01

    Proteins evolve by mutations and natural selection. The network of sequence similarities is a rich source for mining homologous relationships that inform on protein structure and function. There are many servers available to browse the network of homology relationships but one has to wait up to a minute for results. The SANSparallel webserver provides protein sequence database searches with immediate response and professional alignment visualization by third-party software. The output is a list, pairwise alignment or stacked alignment of sequence-similar proteins from Uniprot, UniRef90/50, Swissprot or Protein Data Bank. The stacked alignments are viewed in Jalview or as sequence logos. The database search uses the suffix array neighborhood search (SANS) method, which has been re-implemented as a client-server, improved and parallelized. The method is extremely fast and as sensitive as BLAST above 50% sequence identity. Benchmarks show that the method is highly competitive compared to previously published fast database search programs: UBLAST, DIAMOND, LAST, LAMBDA, RAPSEARCH2 and BLAT. The web server can be accessed interactively or programmatically at http://ekhidna2.biocenter.helsinki.fi/cgi-bin/sans/sans.cgi. It can be used to make protein functional annotation pipelines more efficient, and it is useful in interactive exploration of the detailed evidence supporting the annotation of particular proteins of interest. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  12. Osteoblast-specific factor 2: cloning of a putative bone adhesion protein with homology with the insect protein fasciclin I.

    PubMed Central

    Takeshita, S; Kikuno, R; Tezuka, K; Amann, E

    1993-01-01

    A cDNA library prepared from the mouse osteoblastic cell line MC3T3-E1 was screened for the presence of specifically expressed genes by employing a combined subtraction hybridization/differential screening approach. A cDNA was identified and sequenced which encodes a protein designated osteoblast-specific factor 2 (OSF-2) comprising 811 amino acids. OSF-2 has a typical signal sequence, followed by a cysteine-rich domain, a fourfold repeated domain and a C-terminal domain. The protein lacks a typical transmembrane region. The fourfold repeated domain of OSF-2 shows homology with the insect protein fasciclin I. RNA analyses revealed that OSF-2 is expressed in bone and to a lesser extent in lung, but not in other tissues. Mouse OSF-2 cDNA was subsequently used as a probe to clone the human counterpart. Mouse and human OSF-2 show a high amino acid sequence conservation except for the signal sequence and two regions in the C-terminal domain in which 'in-frame' insertions or deletions are observed, implying alternative splicing events. On the basis of the amino acid sequence homology with fasciclin I, we suggest that OSF-2 functions as a homophilic adhesion molecule in bone formation. Images Figure 3 Figure 4 Figure 5 Figure 6 PMID:8363580

  13. Construction and production of oncotropic vectors, derived from MVM(p), that share reduced sequence homology with helper plasmids.

    PubMed

    Clément, Nathalie; Velu, Thierry; Brandenburger, Annick

    2002-09-01

    The production of currently available vectors derived from autonomous parvoviruses requires the expression of capsid proteins in trans, from helper sequences. Cotransfection of a helper plasmid always generates significant amounts of replication-competent virus (RCV) that can be reduced by the integration of helper sequences into a packaging cell line. Although stocks of minute virus of mice (MVM)-based vectors with no detectable RCV could be produced by transfection into packaging cells; the latter appear after one or two rounds of replication, precluding further amplification of the vector stock. Indeed, once RCVs become detectable, they are efficiently amplified and rapidly take over the culture. Theoretically RCV-free vector stocks could be produced if all homology between vector and helper DNA is eliminated, thus preventing homologous recombination. We constructed new vectors based on the structure of spontaneously occurring defective particles of MVM. Based on published observations related to the size of vectors and the sequence of the viral origin of replication, these vectors were modified by the insertion of foreign DNA sequences downstream of the transgene and by the introduction of a consensus NS-1 nick site near the origin of replication to optimize their production. In one of the vectors the inserted fragment of mouse genomic DNA had a synergistic effect with the modified origin of replication in increasing vector production.

  14. Assignment of the human caltractin gene (CALT) to Xq28 by fluorescence in situ hybridization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tanaka, Tanaka; Okui, Keiko; Nakamura, Yusuke

    1994-12-01

    The centrosome is the major microtubule-organizing center of interphase eukaryotic cells, an its duplication is essential to eukaryotic cell division. Caltractin, a structural component of centrosomes, is highly homologous in amino acid sequence to the product of the CDC31 gene of Saccharomyces cerevisiae. In S. cerevisiae, an important role for CDC31 in duplication of the spindle pole body (SPB), a kind of microtubule-organizing center, has been demonstrated by an experiment in which mutant CDC31 prevented SPB duplication and led to formation of a monopolar spindle. In view of the localization of human caltractin in centrosomes and the sequence homology itmore » bears to yeast CDC31, it is reasonable to assume that caltractin functions in humans as CDC31 does in yeast. As a part of the Human Genome Project, we have been determining nucleotide sequences of DNA clones randomly selected from a directionally cloned cDNA library constructed from fetal brain mRNA obtained from Clontech (La Jolla, CA). By comparing 5{prime} partial DNA sequences of these cDNA clones with known DNA sequences in the database, we found one clone that was highly homologous to the caltractin gene of Chlamydomonas, which turned out to be the same as a human gene identified recently. 4 refs., 1 fig.« less

  15. Structural analysis of key gap junction domains--Lessons from genome data and disease-linked mutants.

    PubMed

    Bai, Donglin

    2016-02-01

    A gap junction (GJ) channel is formed by docking of two GJ hemichannels and each of these hemichannels is a hexamer of connexins. All connexin genes have been identified in human, mouse, and rat genomes and their homologous genes in many other vertebrates are available in public databases. The protein sequences of these connexins align well with high sequence identity in the same connexin across different species. Domains in closely related connexins and several residues in all known connexins are also well-conserved. These conserved residues form signatures (also known as sequence logos) in these domains and are likely to play important biological functions. In this review, the sequence logos of individual connexins, groups of connexins with common ancestors, and all connexins are analyzed to visualize natural evolutionary variations and the hot spots for human disease-linked mutations. Several gap junction domains are homologous, likely forming similar structures essential for their function. The availability of a high resolution Cx26 GJ structure and the subsequently-derived homology structure models for other connexin GJ channels elevated our understanding of sequence logos at the three-dimensional GJ structure level, thus facilitating the understanding of how disease-linked connexin mutants might impair GJ structure and function. This knowledge will enable the design of complementary variants to rescue disease-linked mutants. Copyright © 2015 Elsevier Ltd. All rights reserved.

  16. Saccharomyces cerevisiae SSB1 protein and its relationship to nucleolar RNA-binding proteins.

    PubMed Central

    Jong, A Y; Clark, M W; Gilbert, M; Oehm, A; Campbell, J L

    1987-01-01

    To better define the function of Saccharomyces cerevisiae SSB1, an abundant single-stranded nucleic acid-binding protein, we determined the nucleotide sequence of the SSB1 gene and compared it with those of other proteins of known function. The amino acid sequence contains 293 amino acid residues and has an Mr of 32,853. There are several stretches of sequence characteristic of other eucaryotic single-stranded nucleic acid-binding proteins. At the amino terminus, residues 39 to 54 are highly homologous to a peptide in calf thymus UP1 and UP2 and a human heterogeneous nuclear ribonucleoprotein. Residues 125 to 162 constitute a fivefold tandem repeat of the sequence RGGFRG, the composition of which suggests a nucleic acid-binding site. Near the C terminus, residues 233 to 245 are homologous to several RNA-binding proteins. Of 18 C-terminal residues, 10 are acidic, a characteristic of the procaryotic single-stranded DNA-binding proteins and eucaryotic DNA- and RNA-binding proteins. In addition, examination of the subcellular distribution of SSB1 by immunofluorescence microscopy indicated that SSB1 is a nuclear protein, predominantly located in the nucleolus. Sequence homologies and the nucleolar localization make it likely that SSB1 functions in RNA metabolism in vivo, although an additional role in DNA metabolism cannot be excluded. Images PMID:2823109

  17. Phylogenetic distribution of plant snoRNA families.

    PubMed

    Patra Bhattacharya, Deblina; Canzler, Sebastian; Kehr, Stephanie; Hertel, Jana; Grosse, Ivo; Stadler, Peter F

    2016-11-24

    Small nucleolar RNAs (snoRNAs) are one of the most ancient families amongst non-protein-coding RNAs. They are ubiquitous in Archaea and Eukarya but absent in bacteria. Their main function is to target chemical modifications of ribosomal RNAs. They fall into two classes, box C/D snoRNAs and box H/ACA snoRNAs, which are clearly distinguished by conserved sequence motifs and the type of chemical modification that they govern. Similarly to microRNAs, snoRNAs appear in distinct families of homologs that affect homologous targets. In animals, snoRNAs and their evolution have been studied in much detail. In plants, however, their evolution has attracted comparably little attention. In order to chart the phylogenetic distribution of individual snoRNA families in plants, we applied a sophisticated approach for identifying homologs of known plant snoRNAs across the plant kingdom. In response to the relatively fast evolution of snoRNAs, information on conserved sequence boxes, target sequences, and secondary structure is combined to identify additional snoRNAs. We identified 296 families of snoRNAs in 24 species and traced their evolution throughout the plant kingdom. Many of the plant snoRNA families comprise paralogs. We also found that targets are well-conserved for most snoRNA families. The sequence conservation of snoRNAs is sufficient to establish homologies between phyla. The degree of this conservation tapers off, however, between land plants and algae. Plant snoRNAs are frequently organized in highly conserved spatial clusters. As a resource for further investigations we provide carefully curated and annotated alignments for each snoRNA family under investigation.

  18. Divergence and evolution of homologous regions of Bombyx mori nuclear polyhedrosis virus.

    PubMed Central

    Majima, K; Kobara, R; Maeda, S

    1993-01-01

    Homologous regions (hrs) (hr1,hr2-left,hr2-right,hr3,hr4-left,hr 4-right, and hr5) similar to those found in the Autographa californica nuclear polyhedrosis virus (AcNPV) genome were found in the Bombyx mori NPV (BmNPV) genome. The BmNPV hrs contained two to eight repeats of a homologous nucleotide sequence which were on average about 75 bp long. All of these homologous sequence repeats contained a 26-bp-long palindrome motif with an EcoRI or EcoRI-like site at its core. The consensus sequence of the BmNPV hrs showed 95% conservation with respect to those found in AcNPV. Nucleotide sequence analysis indicated that hr2-left and hr2-right of BmNPV evolved from an ancestor similar to hr2 of AcNPV by inversion, cleavage, and ligation. The polarities of the BmNPV and AcNPV hrs were conserved except for that of hr4-left. Within hr4-right of BmNPV, four repeats of a previously underscribed palindrome motif were found. Bmhr5D, a BmNPV mutant which lacked hr5, replicated at a rate similar to that of wild-type BmNPV in BmN cells and silkworm larvae, indicating that hr5 was not essential for viral replication. After ten passages of Bmhr5D in BmN cells, no detectable changes in its genome were observed by restriction endonuclease analysis. The evolution and divergence of the BmNPV genome are also discussed. Images PMID:8230471

  19. Genomics of Escherichia and Shigella

    NASA Astrophysics Data System (ADS)

    Perna, Nicole T.

    The laboratory workhorse Escherichia coli K-12 is among the most intensively studied living organisms on earth, and this single strain serves as the model system behind much of our understanding of prokaryotic molecular biology. Dense genome sequencing and recent insightful comparative analyses are making the species E. coli, as a whole, an emerging system for studying prokaryotic population genetics and the relationship between system-scale, or genome-scale, molecular evolution and complex traits like host range and pathogenic potential. Genomic perspective has revealed a coherent but dynamic species united by intraspecific gene flow via homologous lateral or horizontal transfer and differentiated by content flux mediated by acquisition of DNA segments from interspecies transfers.

  20. A discriminative method for protein remote homology detection and fold recognition combining Top-n-grams and latent semantic analysis.

    PubMed

    Liu, Bin; Wang, Xiaolong; Lin, Lei; Dong, Qiwen; Wang, Xuan

    2008-12-01

    Protein remote homology detection and fold recognition are central problems in bioinformatics. Currently, discriminative methods based on support vector machine (SVM) are the most effective and accurate methods for solving these problems. A key step to improve the performance of the SVM-based methods is to find a suitable representation of protein sequences. In this paper, a novel building block of proteins called Top-n-grams is presented, which contains the evolutionary information extracted from the protein sequence frequency profiles. The protein sequence frequency profiles are calculated from the multiple sequence alignments outputted by PSI-BLAST and converted into Top-n-grams. The protein sequences are transformed into fixed-dimension feature vectors by the occurrence times of each Top-n-gram. The training vectors are evaluated by SVM to train classifiers which are then used to classify the test protein sequences. We demonstrate that the prediction performance of remote homology detection and fold recognition can be improved by combining Top-n-grams and latent semantic analysis (LSA), which is an efficient feature extraction technique from natural language processing. When tested on superfamily and fold benchmarks, the method combining Top-n-grams and LSA gives significantly better results compared to related methods. The method based on Top-n-grams significantly outperforms the methods based on many other building blocks including N-grams, patterns, motifs and binary profiles. Therefore, Top-n-gram is a good building block of the protein sequences and can be widely used in many tasks of the computational biology, such as the sequence alignment, the prediction of domain boundary, the designation of knowledge-based potentials and the prediction of protein binding sites.

  1. Exploiting rice-sorghum synteny for targeted development of EST-SSRs to enrich the sorghum genetic linkage map.

    PubMed

    Ramu, P; Kassahun, B; Senthilvel, S; Ashok Kumar, C; Jayashree, B; Folkertsma, R T; Reddy, L Ananda; Kuruvinashetti, M S; Haussmann, B I G; Hash, C T

    2009-11-01

    The sequencing and detailed comparative functional analysis of genomes of a number of select botanical models open new doors into comparative genomics among the angiosperms, with potential benefits for improvement of many orphan crops that feed large populations. In this study, a set of simple sequence repeat (SSR) markers was developed by mining the expressed sequence tag (EST) database of sorghum. Among the SSR-containing sequences, only those sharing considerable homology with rice genomic sequences across the lengths of the 12 rice chromosomes were selected. Thus, 600 SSR-containing sorghum EST sequences (50 homologous sequences on each of the 12 rice chromosomes) were selected, with the intention of providing coverage for corresponding homologous regions of the sorghum genome. Primer pairs were designed and polymorphism detection ability was assessed using parental pairs of two existing sorghum mapping populations. About 28% of these new markers detected polymorphism in this 4-entry panel. A subset of 55 polymorphic EST-derived SSR markers were mapped onto the existing skeleton map of a recombinant inbred population derived from cross N13 x E 36-1, which is segregating for Striga resistance and the stay-green component of terminal drought tolerance. These new EST-derived SSR markers mapped across all 10 sorghum linkage groups, mostly to regions expected based on prior knowledge of rice-sorghum synteny. The ESTs from which these markers were derived were then mapped in silico onto the aligned sorghum genome sequence, and 88% of the best hits corresponded to linkage-based positions. This study demonstrates the utility of comparative genomic information in targeted development of markers to fill gaps in linkage maps of related crop species for which sufficient genomic tools are not available.

  2. Implementing the LIM code: the structural basis for cell type-specific assembly of LIM-homeodomain complexes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bhati, Mugdha; Lee, Christopher; Nancarrow, Amy L.

    2008-09-03

    LIM-homeodomain (LIM-HD) transcription factors form a combinatorial 'LIM code' that contributes to the specification of cell types. In the ventral spinal cord, the binary LIM homeobox protein 3 (Lhx3)/LIM domain-binding protein 1 (Ldb1) complex specifies the formation of V2 interneurons. The additional expression of islet-1 (Isl1) in adjacent cells instead specifies the formation of motor neurons through assembly of a ternary complex in which Isl1 contacts both Lhx3 and Ldb1, displacing Lhx3 as the binding partner of Ldb1. However, little is known about how this molecular switch occurs. Here, we have identified the 30-residue Lhx3-binding domain on Isl1 (Isl1{sub LBD}).more » Although the LIM interaction domain of Ldb1 (Ldb1{sub LID}) and Isl1{sub LBD} share low levels of sequence homology, X-ray and NMR structures reveal that they bind Lhx3 in an identical manner, that is, Isl1{sub LBD} mimics Ldb1{sub LID}. These data provide a structural basis for the formation of cell type-specific protein-protein interactions in which unstructured linear motifs with diverse sequences compete to bind protein partners. The resulting alternate protein complexes can target different genes to regulate key biological events.« less

  3. Predicting a double mutant in the twilight zone of low homology modeling for the skeletal muscle voltage-gated sodium channel subunit beta-1 (Nav1.4 β1)

    PubMed Central

    Scior, Thomas; Paiz-Candia, Bertin; Islas, Ángel A.; Sánchez-Solano, Alfredo; Millan-Perez Peña, Lourdes; Mancilla-Simbro, Claudia; Salinas-Stefanon, Eduardo M.

    2015-01-01

    The molecular structure modeling of the β1 subunit of the skeletal muscle voltage-gated sodium channel (Nav1.4) was carried out in the twilight zone of very low homology. Structural significance can per se be confounded with random sequence similarities. Hence, we combined (i) not automated computational modeling of weakly homologous 3D templates, some with interfaces to analogous structures to the pore-bearing Nav1.4 α subunit with (ii) site-directed mutagenesis (SDM), as well as (iii) electrophysiological experiments to study the structure and function of the β1 subunit. Despite the distant phylogenic relationships, we found a 3D-template to identify two adjacent amino acids leading to the long-awaited loss of function (inactivation) of Nav1.4 channels. This mutant type (T109A, N110A, herein called TANA) was expressed and tested on cells of hamster ovary (CHO). The present electrophysiological results showed that the double alanine substitution TANA disrupted channel inactivation as if the β1 subunit would not be in complex with the α subunit. Exhaustive and unbiased sampling of “all β proteins” (Ig-like, Ig) resulted in a plethora of 3D templates which were compared to the target secondary structure prediction. The location of TANA was made possible thanks to another “all β protein” structure in complex with an irreversible bound protein as well as a reversible protein–protein interface (our “Rosetta Stone” effect). This finding coincides with our electrophysiological data (disrupted β1-like voltage dependence) and it is safe to utter that the Nav1.4 α/β1 interface is likely to be of reversible nature. PMID:25904995

  4. Predicting a double mutant in the twilight zone of low homology modeling for the skeletal muscle voltage-gated sodium channel subunit beta-1 (Nav1.4 β1).

    PubMed

    Scior, Thomas; Paiz-Candia, Bertin; Islas, Ángel A; Sánchez-Solano, Alfredo; Millan-Perez Peña, Lourdes; Mancilla-Simbro, Claudia; Salinas-Stefanon, Eduardo M

    2015-01-01

    The molecular structure modeling of the β1 subunit of the skeletal muscle voltage-gated sodium channel (Nav1.4) was carried out in the twilight zone of very low homology. Structural significance can per se be confounded with random sequence similarities. Hence, we combined (i) not automated computational modeling of weakly homologous 3D templates, some with interfaces to analogous structures to the pore-bearing Nav1.4 α subunit with (ii) site-directed mutagenesis (SDM), as well as (iii) electrophysiological experiments to study the structure and function of the β1 subunit. Despite the distant phylogenic relationships, we found a 3D-template to identify two adjacent amino acids leading to the long-awaited loss of function (inactivation) of Nav1.4 channels. This mutant type (T109A, N110A, herein called TANA) was expressed and tested on cells of hamster ovary (CHO). The present electrophysiological results showed that the double alanine substitution TANA disrupted channel inactivation as if the β1 subunit would not be in complex with the α subunit. Exhaustive and unbiased sampling of "all β proteins" (Ig-like, Ig) resulted in a plethora of 3D templates which were compared to the target secondary structure prediction. The location of TANA was made possible thanks to another "all β protein" structure in complex with an irreversible bound protein as well as a reversible protein-protein interface (our "Rosetta Stone" effect). This finding coincides with our electrophysiological data (disrupted β1-like voltage dependence) and it is safe to utter that the Nav1.4 α/β1 interface is likely to be of reversible nature.

  5. Structural basis for antagonism of human interleukin 18 by poxvirus interleukin 18-binding protein

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Krumm, Brian; Meng, Xiangzhi; Li, Yongchao

    2009-07-10

    Human interleukin-18 (hIL-18) is a cytokine that plays an important role in inflammation and host defense against microbes. Its activity is regulated in vivo by a naturally occurring antagonist, the human IL-18-binding protein (IL-18BP). Functional homologs of human IL-18BP are encoded by all orthopoxviruses, including variola virus, the causative agent of smallpox. They contribute to virulence by suppressing IL-18-mediated immune responses. Here, we describe the 2.0-{angstrom} resolution crystal structure of an orthopoxvirus IL-18BP, ectromelia virus IL-18BP (ectvIL-18BP), in complex with hIL-18. The hIL-18 structure in the complex shows significant conformational change at the binding interface compared with the structure ofmore » ligand-free hIL-18, indicating that the binding is mediated by an induced-fit mechanism. EctvIL-18BP adopts a canonical Ig fold and interacts via one edge of its {beta}-sandwich with 3 cavities on the hIL-18 surface through extensive hydrophobic and hydrogen bonding interactions. Most of the ectvIL-18BP residues that participate in these interactions are conserved in both human and viral homologs, explaining their functional equivalence despite limited sequence homology. EctvIL-18BP blocks a putative receptor-binding site on IL-18, thus preventing IL-18 from engaging its receptor. Our structure provides insights into how IL-18BPs modulate hIL-18 activity. The revealed binding interface provides the basis for rational design of inhibitors against orthopoxvirus IL-18BP (for treating orthopoxvirus infection) or hIL-18 (for treating certain inflammatory and autoimmune diseases).« less

  6. Development of 7TM receptor-ligand complex models using ligand-biased, semi-empirical helix-bundle repacking in torsion space: application to the agonist interaction of the human dopamine D2 receptor.

    PubMed

    Malo, Marcus; Persson, Ronnie; Svensson, Peder; Luthman, Kristina; Brive, Lars

    2013-03-01

    Prediction of 3D structures of membrane proteins, and of G-protein coupled receptors (GPCRs) in particular, is motivated by their importance in biological systems and the difficulties associated with experimental structure determination. In the present study, a novel method for the prediction of 3D structures of the membrane-embedded region of helical membrane proteins is presented. A large pool of candidate models are produced by repacking of the helices of a homology model using Monte Carlo sampling in torsion space, followed by ranking based on their geometric and ligand-binding properties. The trajectory is directed by weak initial restraints to orient helices towards the original model to improve computation efficiency, and by a ligand to guide the receptor towards a chosen conformational state. The method was validated by construction of the β1 adrenergic receptor model in complex with (S)-cyanopindolol using bovine rhodopsin as template. In addition, models of the dopamine D2 receptor were produced with the selective and rigid agonist (R)-N-propylapomorphine ((R)-NPA) present. A second quality assessment was implemented by evaluating the results from docking of a library of 29 ligands with known activity, which further discriminated between receptor models. Agonist binding and recognition by the dopamine D2 receptor is interpreted using the 3D structure model resulting from the approach. This method has a potential for modeling of all types of helical transmembrane proteins for which a structural template with sequence homology sufficient for homology modeling is not available or is in an incorrect conformational state, but for which sufficient empirical information is accessible.

  7. Three genes in the human MHC class III region near the junction with the class II: Gene for receptor of advanced glycosylation end products, PBX2 homeobox gene and a notch homolog, human counterpart of mouse mammary tumor gene int-3

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sugaya, K.; Fukagawa, T.; Matsumoto, K.

    Cosmid walking of about 250 kb from MHC class III gene CYP21 to class II was conducted. The gene for receptor of advanced glycosylation end products of proteins (RAGE, a member of immunoglobulin super-family molecules), the PBX2 homeobox gene designated HOX12, and the human counterpart of the mouse mammary tumor gene int-3 were found. The contiguous RAGE and HOX12 genes were completely sequenced, and the human int-3 counterpart was partially sequenced and assigned to a Notch homolog. This human Notch homolog, designated NOTCH3, showed both the intracellular portion present in the mouse int-3 sequence and the extracellular portion absent inmore » the int-3. It thus corresponds to the intact form of a Notch-type transmembrane protein. About 20 kb of dense Alu clustering was found just centromeric to the NOTCH3. 48 refs., 9 figs., 2 tabs.« less

  8. Quantifying the relationship between sequence and three-dimensional structure conservation in RNA

    PubMed Central

    2010-01-01

    Background In recent years, the number of available RNA structures has rapidly grown reflecting the increased interest on RNA biology. Similarly to the studies carried out two decades ago for proteins, which gave the fundamental grounds for developing comparative protein structure prediction methods, we are now able to quantify the relationship between sequence and structure conservation in RNA. Results Here we introduce an all-against-all sequence- and three-dimensional (3D) structure-based comparison of a representative set of RNA structures, which have allowed us to quantitatively confirm that: (i) there is a measurable relationship between sequence and structure conservation that weakens for alignments resulting in below 60% sequence identity, (ii) evolution tends to conserve more RNA structure than sequence, and (iii) there is a twilight zone for RNA homology detection. Discussion The computational analysis here presented quantitatively describes the relationship between sequence and structure for RNA molecules and defines a twilight zone region for detecting RNA homology. Our work could represent the theoretical basis and limitations for future developments in comparative RNA 3D structure prediction. PMID:20550657

  9. Non-homologous isofunctional enzymes: a systematic analysis of alternative solutions in enzyme evolution.

    PubMed

    Omelchenko, Marina V; Galperin, Michael Y; Wolf, Yuri I; Koonin, Eugene V

    2010-04-30

    Evolutionarily unrelated proteins that catalyze the same biochemical reactions are often referred to as analogous - as opposed to homologous - enzymes. The existence of numerous alternative, non-homologous enzyme isoforms presents an interesting evolutionary problem; it also complicates genome-based reconstruction of the metabolic pathways in a variety of organisms. In 1998, a systematic search for analogous enzymes resulted in the identification of 105 Enzyme Commission (EC) numbers that included two or more proteins without detectable sequence similarity to each other, including 34 EC nodes where proteins were known (or predicted) to have distinct structural folds, indicating independent evolutionary origins. In the past 12 years, many putative non-homologous isofunctional enzymes were identified in newly sequenced genomes. In addition, efforts in structural genomics resulted in a vastly improved structural coverage of proteomes, providing for definitive assessment of (non)homologous relationships between proteins. We report the results of a comprehensive search for non-homologous isofunctional enzymes (NISE) that yielded 185 EC nodes with two or more experimentally characterized - or predicted - structurally unrelated proteins. Of these NISE sets, only 74 were from the original 1998 list. Structural assignments of the NISE show over-representation of proteins with the TIM barrel fold and the nucleotide-binding Rossmann fold. From the functional perspective, the set of NISE is enriched in hydrolases, particularly carbohydrate hydrolases, and in enzymes involved in defense against oxidative stress. These results indicate that at least some of the non-homologous isofunctional enzymes were recruited relatively recently from enzyme families that are active against related substrates and are sufficiently flexible to accommodate changes in substrate specificity.

  10. Analysis of Homologs of Cry-toxin Receptor-Related Proteins in the Midgut of a Non-Bt Target, Nilaparvata lugens (Stål) (Hemiptera: Delphacidae)

    PubMed Central

    Shao, Ensi; Lin, Li; Liu, Sijun; Zhang, Jiao; Chen, Xuelin; Sha, Li; Huang, Zhipeng; Huang, Biwang; Guan, Xiong

    2018-01-01

    Abstract The brown planthopper (BPH) Nilaparvata lugens is one of the most destructive insect pests in the rice fields of Asia. Like other hemipteran insects, BPH is not susceptible to Cry toxins of Bacillus thuringiensis (Bt) or transgenic rice carrying Bt cry genes. Lack of Cry receptors in the midgut is one of the main reasons that BPH is not susceptible to the Cry toxins. The main Cry-binding proteins (CBPs) of the susceptible insects are cadherin, aminopeptidase N (APN), and alkaline phosphatase (ALP). In this study, we analyzed and validated de novo assembled transcripts from transcriptome sequencing data of BPH to identify and characterize homologs of cadherin, APN, and ALP. We then compared the cadherin-, APN-, and ALP-like proteins of BPH to previously reported CBPs to identify their homologs in BPH. The sequence analysis revealed that at least one cadherin, one APN, and two ALPs of BPH contained homologous functional domains identified from the Cry-binding cadherin, APN, and ALP, respectively. Quantitative real-time polymerase chain reaction used to verify the expression level of each putative Cry receptor homolog in the BPH midgut indicated that the CBPs homologous APN and ALP were expressed at high or medium-high levels while the cadherin was expressed at a low level. These results suggest that homologs of CBPs exist in the midgut of BPH. However, differences in key motifs of CBPs, which are functional in interacting with Cry toxins, may be responsible for insusceptibility of BPH to Cry toxins. PMID:29415259

  11. Structural and Functional Insights into WRKY3 and WRKY4 Transcription Factors to Unravel the WRKY–DNA (W-Box) Complex Interaction in Tomato (Solanum lycopersicum L.). A Computational Approach

    PubMed Central

    Aamir, Mohd; Singh, Vinay K.; Meena, Mukesh; Upadhyay, Ram S.; Gupta, Vijai K.; Singh, Surendra

    2017-01-01

    The WRKY transcription factors (TFs), play crucial role in plant defense response against various abiotic and biotic stresses. The role of WRKY3 and WRKY4 genes in plant defense response against necrotrophic pathogens is well-reported. However, their functional annotation in tomato is largely unknown. In the present work, we have characterized the structural and functional attributes of the two identified tomato WRKY transcription factors, WRKY3 (SlWRKY3), and WRKY4 (SlWRKY4) using computational approaches. Arabidopsis WRKY3 (AtWRKY3: NP_178433) and WRKY4 (AtWRKY4: NP_172849) protein sequences were retrieved from TAIR database and protein BLAST was done for finding their sequential homologs in tomato. Sequence alignment, phylogenetic classification, and motif composition analysis revealed the remarkable sequential variation between, these two WRKYs. The tomato WRKY3 and WRKY4 clusters with Solanum pennellii showing the monophyletic origin and evolution from their wild homolog. The functional domain region responsible for sequence specific DNA-binding occupied in both proteins were modeled [using AtWRKY4 (PDB ID:1WJ2) and AtWRKY1 (PDBID:2AYD) as template protein structures] through homology modeling using Discovery Studio 3.0. The generated models were further evaluated for their accuracy and reliability based on qualitative and quantitative parameters. The modeled proteins were found to satisfy all the crucial energy parameters and showed acceptable Ramachandran statistics when compared to the experimentally resolved NMR solution structures and/or X-Ray diffracted crystal structures (templates). The superimposition of the functional WRKY domains from SlWRKY3 and SlWRKY4 revealed remarkable structural similarity. The sequence specific DNA binding for two WRKYs was explored through DNA-protein interaction using Hex Docking server. The interaction studies found that SlWRKY4 binds with the W-box DNA through WRKYGQK with Tyr408, Arg409, and Lys419 with the initial flanking sequences also get involved in binding. In contrast, the SlWRKY3 made interaction with RKYGQK along with the residues from zinc finger motifs. Protein-protein interactions studies were done using STRING version 10.0 to explore all the possible protein partners involved in associative functional interaction networks. The Gene ontology enrichment analysis revealed the functional dimension and characterized the identified WRKYs based on their functional annotation. PMID:28611792

  12. Molecular analysis of the split cox1 gene from the Basidiomycota Agrocybe aegerita: relationship of its introns with homologous Ascomycota introns and divergence levels from common ancestral copies.

    PubMed

    Gonzalez, P; Barroso, G; Labarère, J

    1998-10-05

    The Basidiomycota Agrocybe aegerita (Aa) mitochondrial cox1 gene (6790 nucleotides), encoding a protein of 527aa (58377Da), is split by four large subgroup IB introns possessing site-specific endonucleases assumed to be involved in intron mobility. When compared to other fungal COX1 proteins, the Aa protein is closely related to the COX1 one of the Basidiomycota Schizophyllum commune (Sc). This clade reveals a relationship with the studied Ascomycota ones, with the exception of Schizosaccharomyces pombe (Sp) which ranges in an out-group position compared with both higher fungi divisions. When comparison is extended to other kingdoms, fungal COX1 sequences are found to be more related to algae and plant ones (more than 57.5% aa similarity) than to animal sequences (53.6% aa similarity), contrasting with the previously established close relationship between fungi and animals, based on comparisons of nuclear genes. The four Aa cox1 introns are homologous to Ascomycota or algae cox1 introns sharing the same location within the exonic sequences. The percentages of identity of the intronic nucleotide sequences suggest a possible acquisition by lateral transfers of ancestral copies or of their derived sequences. These identities extend over the whole intronic sequences, arguing in favor of a transfer of the complete intron rather than a transfer limited to the encoded ORF. The intron i4 shares 74% of identity, at the nucleotidic level, with the Podospora anserina (Pa) intron i14, and up to 90.5% of aa similarity between the encoded proteins, i.e. the highest values reported to date between introns of two phylogenetically distant species. This low divergence argues for a recent lateral transfer between the two species. On the contrary, the low sequence identities (below 36%) observed between Aa i1 and the homologous Sp i1 or Prototheca wickeramii (Pw) i1 suggest a long evolution time after the separation of these sequences. The introns i2 and i3 possessed intermediate percentages of identity with their homologous Ascomycota introns. This is the first report of the complete nucleotide sequence and molecular organization of a mitochondrial cox1 gene of any member of the Basidiomycota division.

  13. Genome editing via delivery of Cas9 ribonucleoprotein.

    PubMed

    DeWitt, Mark A; Corn, Jacob E; Carroll, Dana

    2017-05-15

    The CRISPR-Cas genome editing system is very powerful. The format of the CRISPR reagents and the means of delivery are often important factors in targeting efficiency. Delivery of recombinant Cas9 protein and guide RNA (gRNA) as a preformed ribonucleoprotein (RNP) complex has recently emerged as a powerful and general approach to genome editing. Here we outline methods to produce and deliver Cas9 RNPs. A donor DNA carrying desired sequence changes can also be included to program precise sequence introduction or replacement. RNP delivery limits exposure to genome editing reagents, reduces off-target events, drives high rates of homology-dependent repair, and can be applied to embryos to rapidly generate animal models. RNP delivery thus minimizes some of the pitfalls of alternative editing modalities and is rapidly being adopted by the genome editing community. Copyright © 2017 Elsevier Inc. All rights reserved.

  14. Whole Genome Sequencing of Danish Staphylococcus argenteus Reveals a Genetically Diverse Collection with Clear Separation from Staphylococcus aureus.

    PubMed

    Hansen, Thomas A; Bartels, Mette D; Høgh, Silje V; Dons, Lone E; Pedersen, Michael; Jensen, Thøger G; Kemp, Michael; Skov, Marianne N; Gumpert, Heidi; Worning, Peder; Westh, Henrik

    2017-01-01

    Staphylococcus argenteus ( S. argenteus ) is a newly identified Staphylococcus species that has been misidentified as Staphylococcus aureus ( S. aureus ) and is clinically relevant. We identified 25 S. argenteus genomes in our collection of whole genome sequenced S. aureus . These genomes were compared to publicly available genomes and a phylogeny revealed seven clusters corresponding to seven clonal complexes. The genome of S. argenteus was found to be different from the genome of S. aureus and a core genome analysis showed that ~33% of the total gene pool was shared between the two species, at 90% homology level. An assessment of mobile elements shows flow of SCC mec cassettes, plasmids, phages, and pathogenicity islands, between S. argenteus and S. aureus . This dataset emphasizes that S. argenteus and S. aureus are two separate species that share genetic material.

  15. Structural and transcription analysis of two homologous genes for the P700 chlorophyll a-apoproteins in Chlamydomonas reinhardii: evidence for in vivo trans-splicing

    PubMed Central

    Kück, Ulrich; Choquet, Yves; Schneider, Michel; Dron, Michel; Bennoun, Pierre

    1987-01-01

    The two homologous genes for the P700 chlorophyll a-apoproteins (ps1A1 and ps1A2) are encoded by the plastom in the green alga Chlamydomonas reinhardii. The structure and organization of the two genes were determined by comparison with the homologous genes from maize using data from heterologous hybridizations as well as from DNA and RNA sequencing. While the ps1A2 (736 codons) gene shows a continuous gene organization, the ps1A1 (754 codons) gene possesses some unusual features. The discontinuous gene is split into three separate exons which are scattered around the circular chloroplast genome. Exon 1 (86 bp) is separated by ∼50 kb from exon 2 (198 bp), which is located ∼ 90 kb apart from exon 3 (1984 bp). All exons are flanked by intronic sequences of group II. Transcription analysis reveals that the ps1A2 gene hybridizes with a 2.8-kb transcript, while all exon regions of the ps1A1 gene are homologous to a mature mRNA of 2.7 kb. From our data we conclude that the three distantly separated exonic sequences of the ps1A1 gene constitute a functional gene which probably operates by a trans-splicing mechanism. ImagesFig. 3.Fig. 5.Fig. 6. PMID:16453785

  16. Molecular cloning, sequence analysis and homology modeling of the first caudata amphibian antifreeze-like protein in axolotl (Ambystoma mexicanum).

    PubMed

    Zhang, Songyan; Gao, Jiuxiang; Lu, Yiling; Cai, Shasha; Qiao, Xue; Wang, Yipeng; Yu, Haining

    2013-08-01

    Antifreeze proteins (AFPs) refer to a class of polypeptides that are produced by certain vertebrates, plants, fungi, and bacteria and which permit their survival in subzero environments. In this study, we report the molecular cloning, sequence analysis and three-dimensional structure of the axolotl antifreeze-like protein (AFLP) by homology modeling of the first caudate amphibian AFLP. We constructed a full-length spleen cDNA library of axolotl (Ambystoma mexicanum). An EST having highest similarity (∼42%) with freeze-responsive liver protein Li16 from Rana sylvatica was identified, and the full-length cDNA was subsequently obtained by RACE-PCR. The axolotl antifreeze-like protein sequence represents an open reading frame for a putative signal peptide and the mature protein composed of 93 amino acids. The calculated molecular mass and the theoretical isoelectric point (pl) of this mature protein were 10128.6 Da and 8.97, respectively. The molecular characterization of this gene and its deduced protein were further performed by detailed bioinformatics analysis. The three-dimensional structure of current AFLP was predicted by homology modeling, and the conserved residues required for functionality were identified. The homology model constructed could be of use for effective drug design. This is the first report of an antifreeze-like protein identified from a caudate amphibian.

  17. Primary structure of rat cardiac beta-adrenergic and muscarinic cholinergic receptors obtained by automated DNA sequence analysis: further evidence for a multigene family.

    PubMed

    Gocayne, J; Robinson, D A; FitzGerald, M G; Chung, F Z; Kerlavage, A R; Lentes, K U; Lai, J; Wang, C D; Fraser, C M; Venter, J C

    1987-12-01

    Two cDNA clones, lambda RHM-MF and lambda RHB-DAR, encoding the muscarinic cholinergic receptor and the beta-adrenergic receptor, respectively, have been isolated from a rat heart cDNA library. The cDNA clones were characterized by restriction mapping and automated DNA sequence analysis utilizing fluorescent dye primers. The rat heart muscarinic receptor consists of 466 amino acids and has a calculated molecular weight of 51,543. The rat heart beta-adrenergic receptor consists of 418 amino acids and has a calculated molecular weight of 46,890. The two cardiac receptors have substantial amino acid homology (27.2% identity, 50.6% with favored substitutions). The rat cardiac beta receptor has 88.0% homology (92.5% with favored substitutions) with the human brain beta receptor and the rat cardiac muscarinic receptor has 94.6% homology (97.6% with favored substitutions) with the porcine cardiac muscarinic receptor. The muscarinic cholinergic and beta-adrenergic receptors appear to be as conserved as hemoglobin and cytochrome c but less conserved than histones and are clearly members of a multigene family. These data support our hypothesis, based upon biochemical and immunological evidence, that suggests considerable structural homology and evolutionary conservation between adrenergic and muscarinic cholinergic receptors. To our knowledge, this is the first report utilizing automated DNA sequence analysis to determine the structure of a gene.

  18. Therapeutic Potential of a Scorpion Venom-Derived Antimicrobial Peptide and Its Homologs Against Antibiotic-Resistant Gram-Positive Bacteria.

    PubMed

    Liu, Gaomin; Yang, Fan; Li, Fangfang; Li, Zhongjie; Lang, Yange; Shen, Bingzheng; Wu, Yingliang; Li, Wenxin; Harrison, Patrick L; Strong, Peter N; Xie, Yingqiu; Miller, Keith; Cao, Zhijian

    2018-01-01

    The alarming rise in the prevalence of antibiotic resistance among pathogenic bacteria poses a unique challenge for the development of effective therapeutic agents. Antimicrobial peptides (AMPs) have attracted a great deal of attention as a possible solution to the increasing problem of antibiotic-resistant bacteria. Marcin-18 was identified from the scorpion Mesobuthus martensii at both DNA and protein levels. The genomic sequence revealed that the marcin-18 coding gene contains a phase-I intron with a GT-AG splice junction located in the DNA region encoding the N -terminal part of signal peptide. The peptide marcin-18 was also isolated from scorpion venom. A protein sequence homology search revealed that marcin-18 shares extremely high sequence identity to the AMPs meucin-18 and megicin-18. In vitro , chemically synthetic marcin-18 and its homologs (meucin-18 and megicin-18) showed highly potent inhibitory activity against Gram-positive bacteria, including some clinical antibiotic-resistant strains. Importantly, in a mouse acute peritonitis model, these peptides significantly decreased the bacterial load in ascites and rescued nearly all mice heavily infected with clinical methicillin-resistant Staphylococcus aureus from lethal bacteremia. Peptides exerted antimicrobial activity via a bactericidal mechanism and killed bacteria through membrane disruption. Taken together, marcin-18 and its homologs have potential for development as therapeutic agents for treating antibiotic-resistant, Gram-positive bacterial infections.

  19. Molecular profiling of appendiceal epithelial tumors using massively parallel sequencing to identify somatic mutations.

    PubMed

    Liu, Xiaoying; Mody, Kabir; de Abreu, Francine B; Pipas, J Marc; Peterson, Jason D; Gallagher, Torrey L; Suriawinata, Arief A; Ripple, Gregory H; Hourdequin, Kathryn C; Smith, Kerrington D; Barth, Richard J; Colacchio, Thomas A; Tsapakos, Michael J; Zaki, Bassem I; Gardner, Timothy B; Gordon, Stuart R; Amos, Christopher I; Wells, Wendy A; Tsongalis, Gregory J

    2014-07-01

    Some epithelial neoplasms of the appendix, including low-grade appendiceal mucinous neoplasm and adenocarcinoma, can result in pseudomyxoma peritonei (PMP). Little is known about the mutational spectra of these tumor types and whether mutations may be of clinical significance with respect to therapeutic selection. In this study, we identified somatic mutations using the Ion Torrent AmpliSeq Cancer Hotspot Panel v2. Specimens consisted of 3 nonneoplastic retention cysts/mucocele, 15 low-grade mucinous neoplasms (LAMNs), 8 low-grade/well-differentiated mucinous adenocarcinomas with pseudomyxoma peritonei, and 12 adenocarcinomas with/without goblet cell/signet ring cell features. Barcoded libraries were prepared from up to 10 ng of extracted DNA and multiplexed on single 318 chips for sequencing. Data analysis was performed using Golden Helix SVS. Variants that remained after the analysis pipeline were individually interrogated using the Integrative Genomics Viewer. A single Janus kinase 3 (JAK3) mutation was detected in the mucocele group. Eight mutations were identified in the V-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog (KRAS) and GNAS complex locus (GNAS) genes among LAMN samples. Additional gene mutations were identified in the AKT1 (v-akt murine thymoma viral oncogene homolog 1), APC (adenomatous polyposis coli), JAK3, MET (met proto-oncogene), phosphatidylinositol-4,5-bisphosphate 3-kinase (PIK3CA), RB1 (retinoblastoma 1), STK11 (serine/threonine kinase 11), and tumor protein p53 (TP53) genes. Among the PMPs, 6 mutations were detected in the KRAS gene and also in the GNAS, TP53, and RB1 genes. Appendiceal cancers showed mutations in the APC, ATM (ataxia telangiectasia mutated), KRAS, IDH1 [isocitrate dehydrogenase 1 (NADP+)], NRAS [neuroblastoma RAS viral (v-ras) oncogene homolog], PIK3CA, SMAD4 (SMAD family member 4), and TP53 genes. Our results suggest molecular heterogeneity among epithelial tumors of the appendix. Next generation sequencing efforts have identified mutational spectra in several subtypes of these tumors that may suggest a phenotypic heterogeneity showing mutations that are relevant for targeted therapies. © 2014 The American Association for Clinical Chemistry.

  20. Transposable element-associated microRNA hairpins produce 21-nt sRNAs integrated into typical microRNA pathways in rice

    PubMed Central

    Ou-Yang, Fangqian; Luo, Qing-Jun; Zhang, Yue; Richardson, Casey R.; Jiang, Yingwen; Rock, Christopher D.

    2013-01-01

    microRNAs (miRNAs) are a class of small RNAs (sRNAs) of ~21 nucleotides (nt) in length processed from foldback hairpins by DICER-LIKE1 (DCL1) or DCL4. They regulate the expression of target mRNAs by base pairing through RNA-Induced Silencing Complex (RISC). In the RISC, ARGONAUTE1 (AGO1) is the key protein that cleaves miRNA targets at position ten of a miRNA:target duplex. The authenticity of many annotated rice miRNA hairpins is under debate because of their homology to repeat sequences. Some of them, like miR1884b, have been removed from the current release of miRBase based on incomplete information. In this study, we investigated the association of transposable element (TE)-derived miRNAs with typical miRNA pathways (DCL1/4- and AGO1-dependent) using publicly available deep sequencing datasets. Seven miRNA hairpins with 13 unique sRNAs were specifically enriched in AGO1 immunoprecipitation samples and relatively reduced in DCL1/4 knockdown genotypes. Interestingly, these species are ~21-nt long, instead of 24-nt as annotated in miRBase and the literature. Their expression profiles meet current criteria for functional annotation of miRNAs. In addition, diagnostic cleavage tags were found in degradome datasets for predicted target mRNAs. Most of these miRNA hairpins share significant homology with miniature inverted-repeat transposable elements (MITEs), one type of abundant DNA transposons in rice. Finally, the root-specific production of a 24 nt miRNA-like sRNA was confirmed by RNA blot for a novel EST that maps to the 3'-UTR of a candidate pseudogene showing extensive sequence homology to miR1884b hairpin. Our data are consistent with the hypothesis that TEs can serve as a driving force for the evolution of some MIRNAs, where co-opting of DICER-LIKE1/4 processing and integration into AGO1 could exapt transcribed TE-associated hairpins into typical miRNA pathways. PMID:23420033

  1. An aureobasidin A resistance gene isolated from Aspergillus is a homolog of yeast AUR1, a gene responsible for inositol phosphorylceramide (IPC) synthase activity.

    PubMed

    Kuroda, M; Hashida-Okado, T; Yasumoto, R; Gomi, K; Kato, I; Takesako, K

    1999-03-01

    The AUR1 gene of Saccharomyces cerevisiae, mutations in which confer resistance to the antibiotic aureobasidin A, is necessary for inositol phosphorylceramide (IPC) synthase activity. We report the molecular cloning and characterization of the Aspergillus nidulans aurA gene, which is homologous to AUR1. A single point mutation in the aurA gene of A. nidulans confers a high level of resistance to aureobasidin A. The A. nidulans aurA gene was used to identify its homologs in other Aspergillus species, including A. fumigatus, A. niger, and A. oryzae. The deduced amino acid sequence of an aurA homolog from the pathogenic fungus A. fumigatus showed 87% identity to that of A. nidulans. The AurA proteins of A. nidulans and A. fumigatus shared common characteristics in primary structure, including sequence, hydropathy profile, and N-glycosylation sites, with their S. cerevisiae, Schizosaccharomyces pombe, and Candida albicans counterparts. These results suggest that the aureobasidin resistance gene is conserved evolutionarily in various fungi.

  2. Solution structure of the DNA-binding domain of RPA from Saccharomyces cerevisiae and its interaction with single-stranded DNA and SV40 T antigen

    PubMed Central

    Park, Chin-Ju; Lee, Joon-Hwa; Choi, Byong-Seok

    2005-01-01

    Replication protein A (RPA) is a three-subunit complex with multiple roles in DNA metabolism. DNA-binding domain A in the large subunit of human RPA (hRPA70A) binds to single-stranded DNA (ssDNA) and is responsible for the species-specific RPA–T antigen (T-ag) interaction required for Simian virus 40 replication. Although Saccharomyces cerevisiae RPA70A (scRPA70A) shares high sequence homology with hRPA70A, the two are not functionally equivalent. To elucidate the similarities and differences between these two homologous proteins, we determined the solution structure of scRPA70A, which closely resembled the structure of hRPA70A. The structure of ssDNA-bound scRPA70A, as simulated by residual dipolar coupling-based homology modeling, suggested that the positioning of the ssDNA is the same for scRPA70A and hRPA70A, although the conformational changes that occur in the two proteins upon ssDNA binding are not identical. NMR titrations of hRPA70A with T-ag showed that the T-ag binding surface is separate from the ssDNA-binding region and is more neutral than the corresponding part of scRPA70A. These differences might account for the species-specific nature of the hRPA70A–T-ag interaction. Our results provide insight into how these two homologous RPA proteins can exhibit functional differences, but still both retain their ability to bind ssDNA. PMID:16043636

  3. A Delicate Balance Between Repair and Replication Factors Regulates Recombination Between Divergent DNA Sequences in Saccharomyces cerevisiae

    PubMed Central

    Chakraborty, Ujani; George, Carolyn M.; Lyndaker, Amy M.; Alani, Eric

    2016-01-01

    Single-strand annealing (SSA) is an important homologous recombination mechanism that repairs DNA double strand breaks (DSBs) occurring between closely spaced repeat sequences. During SSA, the DSB is acted upon by exonucleases to reveal complementary sequences that anneal and are then repaired through tail clipping, DNA synthesis, and ligation steps. In baker’s yeast, the Msh DNA mismatch recognition complex and the Sgs1 helicase act to suppress SSA between divergent sequences by binding to mismatches present in heteroduplex DNA intermediates and triggering a DNA unwinding mechanism known as heteroduplex rejection. Using baker’s yeast as a model, we have identified new factors and regulatory steps in heteroduplex rejection during SSA. First we showed that Top3-Rmi1, a topoisomerase complex that interacts with Sgs1, is required for heteroduplex rejection. Second, we found that the replication processivity clamp proliferating cell nuclear antigen (PCNA) is dispensable for heteroduplex rejection, but is important for repairing mismatches formed during SSA. Third, we showed that modest overexpression of Msh6 results in a significant increase in heteroduplex rejection; this increase is due to a compromise in Msh2-Msh3 function required for the clipping of 3′ tails. Thus 3′ tail clipping during SSA is a critical regulatory step in the repair vs. rejection decision; rejection is favored before the 3′ tails are clipped. Unexpectedly, Msh6 overexpression, through interactions with PCNA, disrupted heteroduplex rejection between divergent sequences in another recombination substrate. These observations illustrate the delicate balance that exists between repair and replication factors to optimize genome stability. PMID:26680658

  4. Expression of an Atriplex nummularia gene encoding a protein homologous to the bacterial molecular chaperone DnaJ.

    PubMed Central

    Zhu, J K; Shi, J; Bressan, R A; Hasegawa, P M

    1993-01-01

    DnaJ is a 36-kD heat shock protein that functions together with Dnak (Hsp70) as a molecular chaperone in Escherichia coli. We have obtained a cDNA clone from the higher plant Atriplex nummularia that encodes a 46.6-kD polypeptide (ANJ1) with an overall 35.2% amino acid sequence identity with the E. coli DnaJ. ANJ1 has 43.4% overall sequence identity with the Saccharomyces cerevisiae cytoplasmic DnaJ homolog YDJ1/MAS5. Complementation of the yeast mas5 mutation indicated that ANJ1 is a functional homolog of YDJ1/MAS5. The presence of other DnaJ homologs in A. nummularia was demonstrated by the detection of proteins that are antigenically related to the yeast mitochondrial DnaJ homolog SCJ1 and the yeast DnaJ-related protein Sec63. Expression of the ANJ1 gene was compared with that of an A. nummularia Hsp70 gene. Expression of both ANJ1 and Hsp70 transcripts was coordinately induced by heat shock. However, noncoordinate accumulation of ANJ1 and Hsp70 mRNAs occurred during the cell growth cycle and in response to NaCl stress. PMID:8467224

  5. CPHmodels-3.0--remote homology modeling using structure-guided sequence profiles.

    PubMed

    Nielsen, Morten; Lundegaard, Claus; Lund, Ole; Petersen, Thomas Nordahl

    2010-07-01

    CPHmodels-3.0 is a web server predicting protein 3D structure by use of single template homology modeling. The server employs a hybrid of the scoring functions of CPHmodels-2.0 and a novel remote homology-modeling algorithm. A query sequence is first attempted modeled using the fast CPHmodels-2.0 profile-profile scoring function suitable for close homology modeling. The new computational costly remote homology-modeling algorithm is only engaged provided that no suitable PDB template is identified in the initial search. CPHmodels-3.0 was benchmarked in the CASP8 competition and produced models for 94% of the targets (117 out of 128), 74% were predicted as high reliability models (87 out of 117). These achieved an average RMSD of 4.6 A when superimposed to the 3D structure. The remaining 26% low reliably models (30 out of 117) could superimpose to the true 3D structure with an average RMSD of 9.3 A. These performance values place the CPHmodels-3.0 method in the group of high performing 3D prediction tools. Beside its accuracy, one of the important features of the method is its speed. For most queries, the response time of the server is <20 min. The web server is available at http://www.cbs.dtu.dk/services/CPHmodels/.

  6. [The genetical evolution of the full length genes of 5 EV 71 strains from 5 Shenzhen patients with hand-food-mouth disease associated with EV71 infection].

    PubMed

    Liu, Wei-long; Yang, Gui-lin; Wei, Qing; Zhang, Ming-xia; Chen, Xin-chun; Liu, Ying-xia; Gao, Yang; Zhou, Bo-ping

    2011-02-01

    To investigate the characteristics of molecular epidemiology and molecular evolution of 5 EV 71 (enterovirus 71, EV71) strains from 5 Shenzhen patients with hand-food-mouth disease associated with EV 71 infection. 5 EV 71 strains were isolated, and sequenced to analyzed the full length gene sequences in order to compare nucleotide and amino acid homology with other EV71 strains from other regions and countries as well as previous strains across the world through bioinformatics software. 5 strains of EV 71 belonged to sub-genotype C4 by analysis of nucleotide sequences of VP1 and VP4 of EV 71. The differences of nucleotide and amino acid sequences were much small with nucleotide homology of 93% and amino acid homology of 98% among these 5 strains. A phylogenetic tree analysis indicated that 2008 Shenzhen epidemic strains were the most close to 2004 Shenzhen circulating strains, and also much close to 1998 Shenzhen epidemic strains and 2008 Fuyang Anhui strains. The dead strain was very close to 2008 Fuyang Anhui epidemic strains. It can be speculated that this epidemic strains of EV 71 probably originate from the same ancient strain in the history, may from 1998 Shenzhen strain.

  7. Sequencing BPS spectra

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gukov, Sergei; Nawata, Satoshi; Saberi, Ingmar

    In this article, we provide both a detailed study of color-dependence of link homologies, as realized in physics as certain spaces of BPS states, and a broad study of the behavior of BPS states in general. We consider how the spectrum of BPS states varies as continuous parameters of a theory are perturbed. This question can be posed in a wide variety of physical contexts, and we answer it by proposing that the relationship between unperturbed and perturbed BPS spectra is described by a spectral sequence. These general considerations unify previous applications of spectral sequence techniques to physics, and explainmore » from a physical standpoint the appearance of many spectral sequences relating various link homology theories to one another. We also study structural properties of colored HOMFLY homology for links and evaluate Poincar e polynomials in numerous examples. Among these structural properties is a novel "sliding" property, which can be explained by using (re fined) modular S-matrix. This leads to the identi fication of modular transformations in Chern-Simons theory and 3d N = 2 theory via the 3d/3d correspondence. In conclusion, we introduce the notion of associated varieties as classical limits of recursion relations of colored superpolynomials of links, and study their properties.« less

  8. Sequencing BPS spectra

    DOE PAGES

    Gukov, Sergei; Nawata, Satoshi; Saberi, Ingmar; ...

    2016-03-02

    In this article, we provide both a detailed study of color-dependence of link homologies, as realized in physics as certain spaces of BPS states, and a broad study of the behavior of BPS states in general. We consider how the spectrum of BPS states varies as continuous parameters of a theory are perturbed. This question can be posed in a wide variety of physical contexts, and we answer it by proposing that the relationship between unperturbed and perturbed BPS spectra is described by a spectral sequence. These general considerations unify previous applications of spectral sequence techniques to physics, and explainmore » from a physical standpoint the appearance of many spectral sequences relating various link homology theories to one another. We also study structural properties of colored HOMFLY homology for links and evaluate Poincar e polynomials in numerous examples. Among these structural properties is a novel "sliding" property, which can be explained by using (re fined) modular S-matrix. This leads to the identi fication of modular transformations in Chern-Simons theory and 3d N = 2 theory via the 3d/3d correspondence. In conclusion, we introduce the notion of associated varieties as classical limits of recursion relations of colored superpolynomials of links, and study their properties.« less

  9. Amino acid sequences of the ribosomal proteins HL30 and HmaL5 from the archaebacterium Halobacterium marismortui.

    PubMed

    Hatakeyama, T; Hatakeyama, T

    1990-07-06

    The complete amino acid sequences of the ribosomal proteins HL30 and HmaL5 from the archaebacterium Halobacterium marismortui were determined. Protein HL30 was found to be acetylated at its N-terminal amino acid and shows homology to the eukaryotic ribosomal proteins YL34 from yeast and RL31 from rat. Protein HmaL5 was homologous to the protein L5 from Escherichia coli and Bacillus stearothermophilus as well as to YL16 from yeast. HmaL5 shows more similarities to its eukaryotic counterpart than to eubacterial ones.

  10. Transcriptome deep-sequencing and clustering of expressed isoforms from Favia corals

    PubMed Central

    2013-01-01

    Background Genomic and transcriptomic sequence data are essential tools for tackling ecological problems. Using an approach that combines next-generation sequencing, de novo transcriptome assembly, gene annotation and synthetic gene construction, we identify and cluster the protein families from Favia corals from the northern Red Sea. Results We obtained 80 million 75 bp paired-end cDNA reads from two Favia adult samples collected at 65 m (Fav1, Fav2) on the Illumina GA platform, and generated two de novo assemblies using ABySS and CAP3. After removing redundancy and filtering out low quality reads, our transcriptome datasets contained 58,268 (Fav1) and 62,469 (Fav2) contigs longer than 100 bp, with N50 values of 1,665 bp and 1,439 bp, respectively. Using the proteome of the sea anemone Nematostella vectensis as a reference, we were able to annotate almost 20% of each dataset using reciprocal homology searches. Homologous clustering of these annotated transcripts allowed us to divide them into 7,186 (Fav1) and 6,862 (Fav2) homologous transcript clusters (E-value ≤ 2e-30). Functional annotation categories were assigned to homologous clusters using the functional annotation of Nematostella vectensis. General annotation of the assembled transcripts was improved 1-3% using the Acropora digitifera proteome. In addition, we screened these transcript isoform clusters for fluorescent proteins (FPs) homologs and identified seven potential FP homologs in Fav1, and four in Fav2. These transcripts were validated as bona fide FP transcripts via robust fluorescence heterologous expression. Annotation of the assembled contigs revealed that 1.34% and 1.61% (in Fav1 and Fav2, respectively) of the total assembled contigs likely originated from the corals’ algal symbiont, Symbiodinium spp. Conclusions Here we present a study to identify the homologous transcript isoform clusters from the transcriptome of Favia corals using a far-related reference proteome. Furthermore, the symbiont-derived transcripts were isolated from the datasets and their contribution quantified. This is the first annotated transcriptome of the genus Favia, a major increase in genomics resources available in this important family of corals. PMID:23937070

  11. Protein structure determination by exhaustive search of Protein Data Bank derived databases.

    PubMed

    Stokes-Rees, Ian; Sliz, Piotr

    2010-12-14

    Parallel sequence and structure alignment tools have become ubiquitous and invaluable at all levels in the study of biological systems. We demonstrate the application and utility of this same parallel search paradigm to the process of protein structure determination, benefitting from the large and growing corpus of known structures. Such searches were previously computationally intractable. Through the method of Wide Search Molecular Replacement, developed here, they can be completed in a few hours with the aide of national-scale federated cyberinfrastructure. By dramatically expanding the range of models considered for structure determination, we show that small (less than 12% structural coverage) and low sequence identity (less than 20% identity) template structures can be identified through multidimensional template scoring metrics and used for structure determination. Many new macromolecular complexes can benefit significantly from such a technique due to the lack of known homologous protein folds or sequences. We demonstrate the effectiveness of the method by determining the structure of a full-length p97 homologue from Trichoplusia ni. Example cases with the MHC/T-cell receptor complex and the EmoB protein provide systematic estimates of minimum sequence identity, structure coverage, and structural similarity required for this method to succeed. We describe how this structure-search approach and other novel computationally intensive workflows are made tractable through integration with the US national computational cyberinfrastructure, allowing, for example, rapid processing of the entire Structural Classification of Proteins protein fragment database.

  12. Low-pass sequencing for microbial comparative genomics

    PubMed Central

    Goo, Young Ah; Roach, Jared; Glusman, Gustavo; Baliga, Nitin S; Deutsch, Kerry; Pan, Min; Kennedy, Sean; DasSarma, Shiladitya; Victor Ng, Wailap; Hood, Leroy

    2004-01-01

    Background We studied four extremely halophilic archaea by low-pass shotgun sequencing: (1) the metabolically versatile Haloarcula marismortui; (2) the non-pigmented Natrialba asiatica; (3) the psychrophile Halorubrum lacusprofundi and (4) the Dead Sea isolate Halobaculum gomorrense. Approximately one thousand single pass genomic sequences per genome were obtained. The data were analyzed by comparative genomic analyses using the completed Halobacterium sp. NRC-1 genome as a reference. Low-pass shotgun sequencing is a simple, inexpensive, and rapid approach that can readily be performed on any cultured microbe. Results As expected, the four archaeal halophiles analyzed exhibit both bacterial and eukaryotic characteristics as well as uniquely archaeal traits. All five halophiles exhibit greater than sixty percent GC content and low isoelectric points (pI) for their predicted proteins. Multiple insertion sequence (IS) elements, often involved in genome rearrangements, were identified in H. lacusprofundi and H. marismortui. The core biological functions that govern cellular and genetic mechanisms of H. sp. NRC-1 appear to be conserved in these four other halophiles. Multiple TATA box binding protein (TBP) and transcription factor IIB (TFB) homologs were identified from most of the four shotgunned halophiles. The reconstructed molecular tree of all five halophiles shows a large divergence between these species, but with the closest relationship being between H. sp. NRC-1 and H. lacusprofundi. Conclusion Despite the diverse habitats of these species, all five halophiles share (1) high GC content and (2) low protein isoelectric points, which are characteristics associated with environmental exposure to UV radiation and hypersalinity, respectively. Identification of multiple IS elements in the genome of H. lacusprofundi and H. marismortui suggest that genome structure and dynamic genome reorganization might be similar to that previously observed in the IS-element rich genome of H. sp. NRC-1. Identification of multiple TBP and TFB homologs in these four halophiles are consistent with the hypothesis that different types of complex transcriptional regulation may occur through multiple TBP-TFB combinations in response to rapidly changing environmental conditions. Low-pass shotgun sequence analyses of genomes permit extensive and diverse analyses, and should be generally useful for comparative microbial genomics. PMID:14718067

  13. Secondary structural entropy in RNA switch (Riboswitch) identification.

    PubMed

    Manzourolajdad, Amirhossein; Arnold, Jonathan

    2015-04-28

    RNA regulatory elements play a significant role in gene regulation. Riboswitches, a widespread group of regulatory RNAs, are vital components of many bacterial genomes. These regulatory elements generally function by forming a ligand-induced alternative fold that controls access to ribosome binding sites or other regulatory sites in RNA. Riboswitch-mediated mechanisms are ubiquitous across bacterial genomes. A typical class of riboswitch has its own unique structural and biological complexity, making de novo riboswitch identification a formidable task. Traditionally, riboswitches have been identified through comparative genomics based on sequence and structural homology. The limitations of structural-homology-based approaches, coupled with the assumption that there is a great diversity of undiscovered riboswitches, suggests the need for alternative methods for riboswitch identification, possibly based on features intrinsic to their structure. As of yet, no such reliable method has been proposed. We used structural entropy of riboswitch sequences as a measure of their secondary structural dynamics. Entropy values of a diverse set of riboswitches were compared to that of their mutants, their dinucleotide shuffles, and their reverse complement sequences under different stochastic context-free grammar folding models. Significance of our results was evaluated by comparison to other approaches, such as the base-pairing entropy and energy landscapes dynamics. Classifiers based on structural entropy optimized via sequence and structural features were devised as riboswitch identifiers and tested on Bacillus subtilis, Escherichia coli, and Synechococcus elongatus as an exploration of structural entropy based approaches. The unusually long untranslated region of the cotH in Bacillus subtilis, as well as upstream regions of certain genes, such as the sucC genes were associated with significant structural entropy values in genome-wide examinations. Various tests show that there is in fact a relationship between higher structural entropy and the potential for the RNA sequence to have alternative structures, within the limitations of our methodology. This relationship, though modest, is consistent across various tests. Understanding the behavior of structural entropy as a fairly new feature for RNA conformational dynamics, however, may require extensive exploratory investigation both across RNA sequences and folding models.

  14. Direct visualization reveals kinetics of meiotic chromosome synapsis

    DOE PAGES

    Rog, Ofer; Dernburg, Abby  F.

    2015-03-17

    The synaptonemal complex (SC) is a conserved protein complex that stabilizes interactions along homologous chromosomes (homologs) during meiosis. The SC regulates genetic exchanges between homologs, thereby enabling reductional division and the production of haploid gametes. Here, we directly observe SC assembly (synapsis) by optimizing methods for long-term fluorescence recording in C. elegans. We report that synapsis initiates independently on each chromosome pair at or near pairing centers—specialized regions required for homolog associations. Once initiated, the SC extends rapidly and mostly irreversibly to chromosome ends. Quantitation of SC initiation frequencies and extension rates reveals that initiation is a rate-limiting step inmore » homolog interactions. Eliminating the dynein-driven chromosome movements that accompany synapsis severely retards SC extension, revealing a new role for these conserved motions. This work provides the first opportunity to directly observe and quantify key aspects of meiotic chromosome interactions and will enable future in vivo analysis of germline processes.« less

  15. Colonization of heterochromatic genes by transposable elements in Drosophila.

    PubMed

    Dimitri, Patrizio; Junakovic, Nikolaj; Arcà, Bruno

    2003-04-01

    As a further step toward understanding transposable element-host genome interactions, we investigated the molecular anatomy of introns from five heterochromatic and 22 euchromatic protein-coding genes of Drosophila melanogaster. A total of 79 kb of intronic sequences from heterochromatic genes and 355 kb of intronic sequences from euchromatic genes have been used in Blast searches against Drosophila transposable elements (TEs). The results show that TE-homologous sequences belonging to 19 different families represent about 50% of intronic DNA from heterochromatic genes. In contrast, only 0.1% of the euchromatic intron DNA exhibits homology to known TEs. Intraspecific and interspecific size polymorphisms of introns were found, which are likely to be associated with changes in TE-related sequences. Together, the enrichment in TEs and the apparent dynamic state of heterochromatic introns suggest that TEs contribute significantly to the evolution of genes located in heterochromatin.

  16. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Leong, JoAnn Ching

    The nucleotide sequence of the IHNV glycoprotein gene has been determined from a cDNA clone containing the entire coding region. The glycoprotein cDNA clone contained a leader sequence of 48 bases, a coding region of 1524 nucleotides, and 39 bases at the 3 foot end. The entire cDNA clone contains 1609 nucleodites and encodes a protein of 508 amino acids. The deduced amino acid sequence gave a translated molecular weight of 56,795 daltons. A hydropathicity profile of the deduced amino acid sequence indicated that there were two major hydrophobic domains: one,at the N-terminus,delineating a signal peptide of 18 amino acidsmore » and the other, at the C-terminus,delineating the region of the transmembrane. Five possible sites of N-linked glyscoylation were identified. Although no nucleic acid homology existed between the IHNV glycoprotein gene and the glycoprotein genes of rabies and VSV, there was significant homology at the amino acid level between all three rhabdovirus glycoproteins.« less

  17. The Porcelain Crab Transcriptome and PCAD, the Porcelain Crab Microarray and Sequence Database

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tagmount, Abderrahmane; Wang, Mei; Lindquist, Erika

    2010-01-27

    Background: With the emergence of a completed genome sequence of the freshwater crustacean Daphnia pulex, construction of genomic-scale sequence databases for additional crustacean sequences are important for comparative genomics and annotation. Porcelain crabs, genus Petrolisthes, have been powerful crustacean models for environmental and evolutionary physiology with respect to thermal adaptation and understanding responses of marine organisms to climate change. Here, we present a large-scale EST sequencing and cDNA microarray database project for the porcelain crab Petrolisthes cinctipes. Methodology/Principal Findings: A set of ~;;30K unique sequences (UniSeqs) representing ~;;19K clusters were generated from ~;;98K high quality ESTs from a set ofmore » tissue specific non-normalized and mixed-tissue normalized cDNA libraries from the porcelain crab Petrolisthes cinctipes. Homology for each UniSeq was assessed using BLAST, InterProScan, GO and KEGG database searches. Approximately 66percent of the UniSeqs had homology in at least one of the databases. All EST and UniSeq sequences along with annotation results and coordinated cDNA microarray datasets have been made publicly accessible at the Porcelain Crab Array Database (PCAD), a feature-enriched version of the Stanford and Longhorn Array Databases.Conclusions/Significance: The EST project presented here represents the third largest sequencing effort for any crustacean, and the largest effort for any crab species. Our assembly and clustering results suggest that our porcelain crab EST data set is equally diverse to the much larger EST set generated in the Daphnia pulex genome sequencing project, and thus will be an important resource to the Daphnia research community. Our homology results support the pancrustacea hypothesis and suggest that Malacostraca may be ancestral to Branchiopoda and Hexapoda. Our results also suggest that our cDNA microarrays cover as much of the transcriptome as can reasonably be captured in EST library sequencing approaches, and thus represent a rich resource for studies of environmental genomics.« less

  18. Protein 3D Structure Computed from Evolutionary Sequence Variation

    PubMed Central

    Sheridan, Robert; Hopf, Thomas A.; Pagnani, Andrea; Zecchina, Riccardo; Sander, Chris

    2011-01-01

    The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing. In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy. We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues., including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7–4.8 Å Cα-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org). This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of protein structures, new strategies in protein and drug design, and the identification of functional genetic variants in normal and disease genomes. PMID:22163331

  19. [Sequencing and analysis of the complete genome of a rabies virus isolate from Sika deer].

    PubMed

    Zhao, Yun-Jiao; Guo, Li; Huang, Ying; Zhang, Li-Shi; Qian, Ai-Dong

    2008-05-01

    One DRV strain was isolated from Sika Deer brain and sequenced. Nine overlapped gene fragments were amplified by RT-PCR through 3'-RACE and 5'-RACE method, and the complete DRV genome sequence was assembled. The length of the complete genome is 11863bp. The DRV genome organization was similar to other rabies viruses which were composed of five genes and the initiation sites and termination sites were highly conservative. There were mutated amino acids in important antigen sites of nucleoprotein and glycoprotein. The nucleotide and amino acid homologies of gene N, P, M, G, L in strains with completed genomie sequencing were compared. Compared with N gene sequence of other typical rabies viruses, a phylogenetic tree was established . These results indicated that DRV belonged to gene type 1. The highest homology compared with Chinese vaccine strain 3aG was 94%, and the lowest was 71% compared with WCBV. These findings provided theoretical reference for further research in rabies virus.

  20. Mechanism of action of chromogranin A on catecholamine release: molecular modeling of the catestatin region reveals a β-strand/loop/β-strand structure secured by hydrophobic interactions and predictive of activity

    PubMed Central

    Tsigelny, Igor; Mahata, Sushil K.; Taupenot, Laurent; Preece, Nicholas E.; Mahata, Manjula; Khan, Imran; Parmer, Robert J.; O’Connor, Daniel T.

    2009-01-01

    A novel fragment of chromogranin A, known as ‘catestatin’ (bovine chromogranin A344–364), inhibits catecholamine release from chromaffin cells and noradrenergic neurons by acting as a non-competitive nicotinic cholinergic antagonist, and may therefore constitute an endogenous autocrine feedback regulator of sympathoadrenal activity. To characterize how this activity depends on the peptide’s structure, we searched for common 3-dimensional motifs for this primary structure or its homologs. Catestatin’s primary structure bore significant (29–35.5% identity, general alignment score 44–57) sequence homology to fragment sequences within three homologs of known 3-dimensional structures, based on solved X-ray crystals: 8FAB, 1PKM, and 2IG2. Each of these sequences exists in nature as a β-strand/loop/β-strand structure, stabilized by hydrophobic interactions between the β-strands. The catestatin structure was stable during molecular dynamics simulations. The catestatin loop contains three Arg residues, whose electropositive side chains form the terminus of the structure, and give rise to substantial uncompensated charge asymmetry in the molecule. A hydrophobic moment plot revealed that catestatin is the only segment of chromogranin A predicted to contain amphiphilic β-strand. Circular dichroism in the far ultraviolet showed substantial (63%) β-sheet structure, especially in a hydrophobic environment. Alanine-substitution mutants of catestatin established a crucial role for the three central arginine residues in the loop (Arg351, Arg353, and Arg358), though not for two arginine residues in the strand region toward the amino-terminus. [125I]Catestatin bound to Torpedo membranes at a site other than the nicotinic agonist binding site. When the catestatin structure was ‘docked’ with the extracellular domain of the Torpedo nicotinic cholinergic receptor, it interacted principally with the β and δ subunits, in a relatively hydrophobic region of the cation pore extracellular orifice, and the complex of ligand and receptor largely occluded the cation pore, providing a structural basis for the non-competitive nicotinic cholinergic antagonist properties of the peptide. We conclude that a homology model of catestatin correctly predicts actual features of the peptide, both physical and biological. The model suggests particular spatial and charge features of the peptide which may serve as starting points in the development of non-peptide mimetics of this endogenous nicotinic cholinergic antagonist. PMID:9809795

  1. Evolving the Concept of Homology

    ERIC Educational Resources Information Center

    Naples, Virginia L.; Miller, Jon S.

    2009-01-01

    Understanding homology is fundamental to learning about evolution. The present study shows an exercise that can be varied in complexity, for which students compile research illustrating the fate of homologous fish skull elements, and assemble a mural to serve as a learning aid. The skull of the most primitive living Actinopterygian (bony fish),…

  2. DNA homology among diverse spiroplasma strains representing several serological groups.

    PubMed

    Lee, I M; Davis, R E

    1980-11-01

    Deoxyribonucleic acid (DNA) homology among 10 strains of spiroplasma associated with plants and insects was assessed by analysis of DNA-DNA hybrids with single strand specific S1 nuclease. Based on DNA homology, the spiroplasmas could be divided into three genetically distinct groups (designated I, II, and III), corresponding to three separate serogroups described previously. DNA sequence homology between the three groups was less than or equal to 5%. Based on DNA homology, group I could be divided into three subgroups (A, B, and C) that corresponded to three serological subgroups of serogroup I. Subgroup A contained Spiroplasma citri strains Maroc R8A2 and C 189; subgroup B contained strains AS 576 from honey bee and G 1 from flowers; subgroup C contained corn stunt spiroplasma strains I-747 and PU 8-17. There was 27-54% DNA sequence homology among these three subgroups. Group II contained strains 23-6 and 27-31 isolated from flowers of tulip tree (Liriodendron tulipifera L.). Group III contained strains SR 3 and SR 9, other isolates from flowers of tulip tree. Based on thermal denaturation, guanine plus cytosine contents of DNA from five type strains representing all groups and subgroups were estimated to be close to 26 mol% for group I strains, close to 25 mol% for group II strains, and close to 29 mol% for group III strains. The genome molecular weights of these five type strains were all estimated to bae about 10(9).

  3. Comparative analysis of expressed sequence tags of conifers and angiosperms reveals sequences specifically conserved in conifers.

    PubMed

    Ujino-Ihara, Tokuko; Kanamori, Hiroyuki; Yamane, Hiroko; Taguchi, Yuriko; Namiki, Nobukazu; Mukai, Yuzuru; Yoshimura, Kensuke; Tsumura, Yoshihiko

    2005-12-01

    To identify and characterize lineage-specific genes of conifers, two sets of ESTs (with 12791 and 5902 ESTs, representing 5373 and 3018 gene transcripts, respectively) were generated from the Cupressaceae species Cryptomeria japonica and Chamaecyparis obtusa. These transcripts were compared with non-redundant sets of genes generated from Pinaceae species, other gymnosperms and angiosperms. About 6% of tentative unique genes (Unigenes) of C. japonica and C. obtusa had homologs in other conifers but not angiosperms, and about 70% had apparent homologs in angiosperms. The calculated GC contents of orthologous genes showed that GC contents of coniferous genes are likely to be lower than those of angiosperms. Comparisons of the numbers of homologous genes in each species suggest that copy numbers of genes may be correlated between diverse seed plants. This correlation suggests that the multiplicity of such genes may have arisen before the divergence of gymnosperms and angiosperms.

  4. Camelid Ig V genes reveal significant human homology not seen in therapeutic target genes, providing for a powerful therapeutic antibody platform

    PubMed Central

    Klarenbeek, Alex; Mazouari, Khalil El; Desmyter, Aline; Blanchetot, Christophe; Hultberg, Anna; de Jonge, Natalie; Roovers, Rob C; Cambillau, Christian; Spinelli, Sylvia; Del-Favero, Jurgen; Verrips, Theo; de Haard, Hans J; Achour, Ikbel

    2015-01-01

    Camelid immunoglobulin variable (IGV) regions were found homologous to their human counterparts; however, the germline V repertoires of camelid heavy and light chains are still incomplete and their therapeutic potential is only beginning to be appreciated. We therefore leveraged the publicly available HTG and WGS databases of Lama pacos and Camelus ferus to retrieve the germline repertoire of V genes using human IGV genes as reference. In addition, we amplified IGKV and IGLV genes to uncover the V germline repertoire of Lama glama and sequenced BAC clones covering part of the Lama pacos IGK and IGL loci. Our in silico analysis showed that camelid counterparts of all human IGKV and IGLV families and most IGHV families could be identified, based on canonical structure and sequence homology. Interestingly, this sequence homology seemed largely restricted to the Ig V genes and was far less apparent in other genes: 6 therapeutically relevant target genes differed significantly from their human orthologs. This contributed to efficient immunization of llamas with the human proteins CD70, MET, interleukin (IL)-1β and IL-6, resulting in large panels of functional antibodies. The in silico predicted human-homologous canonical folds of camelid-derived antibodies were confirmed by X-ray crystallography solving the structure of 2 selected camelid anti-CD70 and anti-MET antibodies. These antibodies showed identical fold combinations as found in the corresponding human germline V families, yielding binding site structures closely similar to those occurring in human antibodies. In conclusion, our results indicate that active immunization of camelids can be a powerful therapeutic antibody platform. PMID:26018625

  5. Microarray analysis of gene expression profiles in ripening pineapple fruits.

    PubMed

    Koia, Jonni H; Moyle, Richard L; Botella, Jose R

    2012-12-18

    Pineapple (Ananas comosus) is a tropical fruit crop of significant commercial importance. Although the physiological changes that occur during pineapple fruit development have been well characterized, little is known about the molecular events that occur during the fruit ripening process. Understanding the molecular basis of pineapple fruit ripening will aid the development of new varieties via molecular breeding or genetic modification. In this study we developed a 9277 element pineapple microarray and used it to profile gene expression changes that occur during pineapple fruit ripening. Microarray analyses identified 271 unique cDNAs differentially expressed at least 1.5-fold between the mature green and mature yellow stages of pineapple fruit ripening. Among these 271 sequences, 184 share significant homology with genes encoding proteins of known function, 53 share homology with genes encoding proteins of unknown function and 34 share no significant homology with any database accession. Of the 237 pineapple sequences with homologs, 160 were up-regulated and 77 were down-regulated during pineapple fruit ripening. DAVID Functional Annotation Cluster (FAC) analysis of all 237 sequences with homologs revealed confident enrichment scores for redox activity, organic acid metabolism, metalloenzyme activity, glycolysis, vitamin C biosynthesis, antioxidant activity and cysteine peptidase activity, indicating the functional significance and importance of these processes and pathways during pineapple fruit development. Quantitative real-time PCR analysis validated the microarray expression results for nine out of ten genes tested. This is the first report of a microarray based gene expression study undertaken in pineapple. Our bioinformatic analyses of the transcript profiles have identified a number of genes, processes and pathways with putative involvement in the pineapple fruit ripening process. This study extends our knowledge of the molecular basis of pineapple fruit ripening and non-climacteric fruit ripening in general.

  6. Microarray analysis of gene expression profiles in ripening pineapple fruits

    PubMed Central

    2012-01-01

    Background Pineapple (Ananas comosus) is a tropical fruit crop of significant commercial importance. Although the physiological changes that occur during pineapple fruit development have been well characterized, little is known about the molecular events that occur during the fruit ripening process. Understanding the molecular basis of pineapple fruit ripening will aid the development of new varieties via molecular breeding or genetic modification. In this study we developed a 9277 element pineapple microarray and used it to profile gene expression changes that occur during pineapple fruit ripening. Results Microarray analyses identified 271 unique cDNAs differentially expressed at least 1.5-fold between the mature green and mature yellow stages of pineapple fruit ripening. Among these 271 sequences, 184 share significant homology with genes encoding proteins of known function, 53 share homology with genes encoding proteins of unknown function and 34 share no significant homology with any database accession. Of the 237 pineapple sequences with homologs, 160 were up-regulated and 77 were down-regulated during pineapple fruit ripening. DAVID Functional Annotation Cluster (FAC) analysis of all 237 sequences with homologs revealed confident enrichment scores for redox activity, organic acid metabolism, metalloenzyme activity, glycolysis, vitamin C biosynthesis, antioxidant activity and cysteine peptidase activity, indicating the functional significance and importance of these processes and pathways during pineapple fruit development. Quantitative real-time PCR analysis validated the microarray expression results for nine out of ten genes tested. Conclusions This is the first report of a microarray based gene expression study undertaken in pineapple. Our bioinformatic analyses of the transcript profiles have identified a number of genes, processes and pathways with putative involvement in the pineapple fruit ripening process. This study extends our knowledge of the molecular basis of pineapple fruit ripening and non-climacteric fruit ripening in general. PMID:23245313

  7. Natural non-homologous recombination led to the emergence of a duplicated V3-NS5A region in HCV-1b strains associated with hepatocellular carcinoma.

    PubMed

    Le Guillou-Guillemette, Hélène; Pivert, Adeline; Bouthry, Elise; Henquell, Cécile; Petsaris, Odile; Ducancelle, Alexandra; Veillon, Pascal; Vallet, Sophie; Alain, Sophie; Thibault, Vincent; Abravanel, Florence; Rosenberg, Arielle A; André-Garnier, Elisabeth; Bour, Jean-Baptiste; Baazia, Yazid; Trimoulet, Pascale; André, Patrice; Gaudy-Graffin, Catherine; Bettinger, Dominique; Larrat, Sylvie; Signori-Schmuck, Anne; Saoudin, Hénia; Pozzetto, Bruno; Lagathu, Gisèle; Minjolle-Cha, Sophie; Stoll-Keller, Françoise; Pawlotsky, Jean-Michel; Izopet, Jacques; Payan, Christopher; Lunel-Fabiani, Françoise; Lemaire, Christophe

    2017-01-01

    The emergence of new strains in RNA viruses is mainly due to mutations or intra and inter-genotype homologous recombination. Non-homologous recombinations may be deleterious and are rarely detected. In previous studies, we identified HCV-1b strains bearing two tandemly repeated V3 regions in the NS5A gene without ORF disruption. This polymorphism may be associated with an unfavorable course of liver disease and possibly involved in liver carcinogenesis. Here we aimed at characterizing the origin of these mutant strains and identifying the evolutionary mechanism on which the V3 duplication relies. Direct sequencing of the entire NS5A and E1 genes was performed on 27 mutant strains. Quasispecies analyses in consecutive samples were also performed by cloning and sequencing the NS5A gene for all mutant and wild strains. We analyzed the mutant and wild-type sequence polymorphisms using Bayesian methods to infer the evolutionary history of and the molecular mechanism leading to the duplication-like event. Quasispecies were entirely composed of exclusively mutant or wild-type strains respectively. Mutant quasispecies were found to have been present since contamination and had persisted for at least 10 years. This V3 duplication-like event appears to have resulted from non-homologous recombination between HCV-1b wild-type strains around 100 years ago. The association between increased liver disease severity and these HCV-1b mutants may explain their persistence in chronically infected patients. These results emphasize the possible consequences of non-homologous recombination in the emergence and severity of new viral diseases.

  8. Clustering evolving proteins into homologous families.

    PubMed

    Chan, Cheong Xin; Mahbob, Maisarah; Ragan, Mark A

    2013-04-08

    Clustering sequences into groups of putative homologs (families) is a critical first step in many areas of comparative biology and bioinformatics. The performance of clustering approaches in delineating biologically meaningful families depends strongly on characteristics of the data, including content bias and degree of divergence. New, highly scalable methods have recently been introduced to cluster the very large datasets being generated by next-generation sequencing technologies. However, there has been little systematic investigation of how characteristics of the data impact the performance of these approaches. Using clusters from a manually curated dataset as reference, we examined the performance of a widely used graph-based Markov clustering algorithm (MCL) and a greedy heuristic approach (UCLUST) in delineating protein families coded by three sets of bacterial genomes of different G+C content. Both MCL and UCLUST generated clusters that are comparable to the reference sets at specific parameter settings, although UCLUST tends to under-cluster compositionally biased sequences (G+C content 33% and 66%). Using simulated data, we sought to assess the individual effects of sequence divergence, rate heterogeneity, and underlying G+C content. Performance decreased with increasing sequence divergence, decreasing among-site rate variation, and increasing G+C bias. Two MCL-based methods recovered the simulated families more accurately than did UCLUST. MCL using local alignment distances is more robust across the investigated range of sequence features than are greedy heuristics using distances based on global alignment. Our results demonstrate that sequence divergence, rate heterogeneity and content bias can individually and in combination affect the accuracy with which MCL and UCLUST can recover homologous protein families. For application to data that are more divergent, and exhibit higher among-site rate variation and/or content bias, MCL may often be the better choice, especially if computational resources are not limiting.

  9. Noncytopathogenic Pestivirus Strains Generated by Nonhomologous RNA Recombination: Alterations in the NS4A/NS4B Coding Region

    PubMed Central

    Gallei, Andreas; Orlich, Michaela; Thiel, Heinz-Juergen; Becher, Paul

    2005-01-01

    Several studies have demonstrated that cytopathogenic (cp) pestivirus strains evolve from noncytopathogenic (noncp) viruses by nonhomologous RNA recombination. In addition, two recent reports showed the rapid emergence of noncp Bovine viral diarrhea virus (BVDV) after a few cell culture passages of cp BVDV strains by homologous recombination between identical duplicated viral sequences. To allow the identification of recombination sites from noncp BVDV strains that evolve from cp viruses, we constructed the cp BVDV strains CP442 and CP552. Both harbor duplicated viral sequences of different origin flanking the cellular insertion Nedd8*; the latter is a prerequisite for their cytopathogenicity. In contrast to the previous studies, isolation of noncp strains was possible only after extensive cell culture passages of CP442 and CP552. Sequence analysis of 15 isolated noncp BVDVs confirmed that all recombinant strains lack at least most of Nedd8*. Interestingly, only one strain resulted from homologous recombination while the other 14 strains were generated by nonhomologous recombination. Accordingly, our data suggest that the extent of sequence identity between participating sequences influences both frequency and mode (homologous versus nonhomologous) of RNA recombination in pestiviruses. Further analyses of the noncp recombinant strains revealed that a duplication of 14 codons in the BVDV nonstructural protein 4B (NS4B) gene does not interfere with efficient viral replication. Moreover, an insertion of viral sequences between the NS4A and NS4B genes was well tolerated. These findings thus led to the identification of two genomic loci which appear to be suited for the insertion of heterologous sequences into the genomes of pestiviruses and related viruses. PMID:16254361

  10. Isolation of a novel Orientia species (O. chuto sp. nov.) from a patient infected in Dubai.

    PubMed

    Izzard, Leonard; Fuller, Andrew; Blacksell, Stuart D; Paris, Daniel H; Richards, Allen L; Aukkanit, Nuntipa; Nguyen, Chelsea; Jiang, Ju; Fenwick, Stan; Day, Nicholas P J; Graves, Stephen; Stenos, John

    2010-12-01

    In July 2006, an Australian tourist returning from Dubai, in the United Arab Emirates (UAE), developed acute scrub typhus. Her signs and symptoms included fever, myalgia, headache, rash, and eschar. Orientia tsutsugamushi serology demonstrated a 4-fold rise in antibody titers in paired serum collections (1:512 to 1:8,192), with the sera reacting strongest against the Gilliam strain antigen. An Orientia species was isolated by the in vitro culture of the patient's acute blood taken prior to antibiotic treatment. The gene sequencing of the 16S rRNA gene (rrs), partial 56-kDa gene, and the full open reading frame 47-kDa gene was performed, and comparisons of this new Orientia sp. isolate to previously characterized strains demonstrated significant sequence diversity. The closest homology to the rrs sequence of the new Orientia sp. isolate was with three strains of O. tsutsugamushi (Ikeda, Kato, and Karp), with a nucleotide sequence similarity of 98.5%. The closest homology to the 47-kDa gene sequence was with O. tsutsugamushi strain Gilliam, with a nucleotide similarity of 82.3%, while the closest homology to the 56-kDa gene sequence was with O. tsutsugamushi strain TA686, with a nucleotide similarity of 53.1%. The molecular divergence and geographically unique origin lead us to believe that this organism should be considered a novel species. Therefore, we have proposed the name "Orientia chuto," and the prototype strain of this species is strain Dubai, named after the location in which the patient was infected.

  11. Isolation of a Novel Orientia Species (O. chuto sp. nov.) from a Patient Infected in Dubai ▿

    PubMed Central

    Izzard, Leonard; Fuller, Andrew; Blacksell, Stuart D.; Paris, Daniel H.; Richards, Allen L.; Aukkanit, Nuntipa; Nguyen, Chelsea; Jiang, Ju; Fenwick, Stan; Day, Nicholas P. J.; Graves, Stephen; Stenos, John

    2010-01-01

    In July 2006, an Australian tourist returning from Dubai, in the United Arab Emirates (UAE), developed acute scrub typhus. Her signs and symptoms included fever, myalgia, headache, rash, and eschar. Orientia tsutsugamushi serology demonstrated a 4-fold rise in antibody titers in paired serum collections (1:512 to 1:8,192), with the sera reacting strongest against the Gilliam strain antigen. An Orientia species was isolated by the in vitro culture of the patient's acute blood taken prior to antibiotic treatment. The gene sequencing of the 16S rRNA gene (rrs), partial 56-kDa gene, and the full open reading frame 47-kDa gene was performed, and comparisons of this new Orientia sp. isolate to previously characterized strains demonstrated significant sequence diversity. The closest homology to the rrs sequence of the new Orientia sp. isolate was with three strains of O. tsutsugamushi (Ikeda, Kato, and Karp), with a nucleotide sequence similarity of 98.5%. The closest homology to the 47-kDa gene sequence was with O. tsutsugamushi strain Gilliam, with a nucleotide similarity of 82.3%, while the closest homology to the 56-kDa gene sequence was with O. tsutsugamushi strain TA686, with a nucleotide similarity of 53.1%. The molecular divergence and geographically unique origin lead us to believe that this organism should be considered a novel species. Therefore, we have proposed the name “Orientia chuto,” and the prototype strain of this species is strain Dubai, named after the location in which the patient was infected. PMID:20926708

  12. Extensive characterization of Tupaia belangeri neuropeptidome using an integrated mass spectrometric approach.

    PubMed

    Petruzziello, Filomena; Fouillen, Laetitia; Wadensten, Henrik; Kretz, Robert; Andren, Per E; Rainer, Gregor; Zhang, Xiaozhe

    2012-02-03

    Neuropeptidomics is used to characterize endogenous peptides in the brain of tree shrews (Tupaia belangeri). Tree shrews are small animals similar to rodents in size but close relatives of primates, and are excellent models for brain research. Currently, tree shrews have no complete proteome information available on which direct database search can be allowed for neuropeptide identification. To increase the capability in the identification of neuropeptides in tree shrews, we developed an integrated mass spectrometry (MS)-based approach that combines methods including data-dependent, directed, and targeted liquid chromatography (LC)-Fourier transform (FT)-tandem MS (MS/MS) analysis, database construction, de novo sequencing, precursor protein search, and homology analysis. Using this integrated approach, we identified 107 endogenous peptides that have sequences identical or similar to those from other mammalian species. High accuracy MS and tandem MS information, with BLAST analysis and chromatographic characteristics were used to confirm the sequences of all the identified peptides. Interestingly, further sequence homology analysis demonstrated that tree shrew peptides have a significantly higher degree of homology to equivalent sequences in humans than those in mice or rats, consistent with the close phylogenetic relationship between tree shrews and primates. Our results provide the first extensive characterization of the peptidome in tree shrews, which now permits characterization of their function in nervous and endocrine system. As the approach developed fully used the conservative properties of neuropeptides in evolution and the advantage of high accuracy MS, it can be portable for identification of neuropeptides in other species for which the fully sequenced genomes or proteomes are not available.

  13. Characterization and Nucleotide Sequence of CARB-6, a New Carbenicillin-Hydrolyzing β-Lactamase from Vibrio cholerae

    PubMed Central

    Choury, Danièle; Aubert, Gérald; Szajnert, Marie-France; Azibi, Kemal; Delpech, Marc; Paul, Gérard

    1999-01-01

    A clinical strain of Vibrio cholerae non-O1 non-O139 isolated in France produced a new β-lactamase with a pI of 5.35. The purified enzyme, with a molecular mass of 33,000 Da, was characterized. Its kinetic constants show it to be a carbenicillin-hydrolyzing enzyme comparable to the five previously reported CARB β-lactamases and to SAR-1, another carbenicillin-hydrolyzing β-lactamase that has a pI of 4.9 and that is produced by a V. cholerae strain from Tanzania. This β-lactamase is designated CARB-6, and the gene for CARB-6 could not be transferred to Escherichia coli K-12 by conjugation. The nucleotide sequence of the structural gene was determined by direct sequencing of PCR-generated fragments from plasmid DNA with four pairs of primers covering the whole sequence of the reference CARB-3 gene. The gene encodes a 288-amino-acid protein that shares 94% homology with the CARB-1, CARB-2, and CARB-3 enzymes, 93% homology with the Proteus mirabilis N29 enzyme, and 86.5% homology with the CARB-4 enzyme. The sequence of CARB-6 differs from those of CARB-3, CARB-2, CARB-1, N29, and CARB-4 at 15, 16, 17, 19, and 37 amino acid positions, respectively. All these mutations are located in the C-terminal region of the sequence and at the surface of the molecule, according to the crystal structure of the Staphylococcus aureus PC-1 β-lactamase. PMID:9925522

  14. Annotating Protein Functional Residues by Coupling High-Throughput Fitness Profile and Homologous-Structure Analysis

    PubMed Central

    Du, Yushen; Wu, Nicholas C.; Jiang, Lin; Zhang, Tianhao; Gong, Danyang; Shu, Sara; Wu, Ting-Ting

    2016-01-01

    ABSTRACT Identification and annotation of functional residues are fundamental questions in protein sequence analysis. Sequence and structure conservation provides valuable information to tackle these questions. It is, however, limited by the incomplete sampling of sequence space in natural evolution. Moreover, proteins often have multiple functions, with overlapping sequences that present challenges to accurate annotation of the exact functions of individual residues by conservation-based methods. Using the influenza A virus PB1 protein as an example, we developed a method to systematically identify and annotate functional residues. We used saturation mutagenesis and high-throughput sequencing to measure the replication capacity of single nucleotide mutations across the entire PB1 protein. After predicting protein stability upon mutations, we identified functional PB1 residues that are essential for viral replication. To further annotate the functional residues important to the canonical or noncanonical functions of viral RNA-dependent RNA polymerase (vRdRp), we performed a homologous-structure analysis with 16 different vRdRp structures. We achieved high sensitivity in annotating the known canonical polymerase functional residues. Moreover, we identified a cluster of noncanonical functional residues located in the loop region of the PB1 β-ribbon. We further demonstrated that these residues were important for PB1 protein nuclear import through the interaction with Ran-binding protein 5. In summary, we developed a systematic and sensitive method to identify and annotate functional residues that are not restrained by sequence conservation. Importantly, this method is generally applicable to other proteins about which homologous-structure information is available. PMID:27803181

  15. Functionally conserved enhancers with divergent sequences in distant vertebrates

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yang, Song; Oksenberg, Nir; Takayama, Sachiko

    To examine the contributions of sequence and function conservation in the evolution of enhancers, we systematically identified enhancers whose sequences are not conserved among distant groups of vertebrate species, but have homologous function and are likely to be derived from a common ancestral sequence. In conclusion, our approach combined comparative genomics and epigenomics to identify potential enhancer sequences in the genomes of three groups of distantly related vertebrate species.

  16. Functionally conserved enhancers with divergent sequences in distant vertebrates

    DOE PAGES

    Yang, Song; Oksenberg, Nir; Takayama, Sachiko; ...

    2015-10-30

    To examine the contributions of sequence and function conservation in the evolution of enhancers, we systematically identified enhancers whose sequences are not conserved among distant groups of vertebrate species, but have homologous function and are likely to be derived from a common ancestral sequence. In conclusion, our approach combined comparative genomics and epigenomics to identify potential enhancer sequences in the genomes of three groups of distantly related vertebrate species.

  17. DSAP: deep-sequencing small RNA analysis pipeline.

    PubMed

    Huang, Po-Jung; Liu, Yi-Chung; Lee, Chi-Ching; Lin, Wei-Chen; Gan, Richie Ruei-Chi; Lyu, Ping-Chiang; Tang, Petrus

    2010-07-01

    DSAP is an automated multiple-task web service designed to provide a total solution to analyzing deep-sequencing small RNA datasets generated by next-generation sequencing technology. DSAP uses a tab-delimited file as an input format, which holds the unique sequence reads (tags) and their corresponding number of copies generated by the Solexa sequencing platform. The input data will go through four analysis steps in DSAP: (i) cleanup: removal of adaptors and poly-A/T/C/G/N nucleotides; (ii) clustering: grouping of cleaned sequence tags into unique sequence clusters; (iii) non-coding RNA (ncRNA) matching: sequence homology mapping against a transcribed sequence library from the ncRNA database Rfam (http://rfam.sanger.ac.uk/); and (iv) known miRNA matching: detection of known miRNAs in miRBase (http://www.mirbase.org/) based on sequence homology. The expression levels corresponding to matched ncRNAs and miRNAs are summarized in multi-color clickable bar charts linked to external databases. DSAP is also capable of displaying miRNA expression levels from different jobs using a log(2)-scaled color matrix. Furthermore, a cross-species comparative function is also provided to show the distribution of identified miRNAs in different species as deposited in miRBase. DSAP is available at http://dsap.cgu.edu.tw.

  18. Sequencing, Analysis, and Annotation of Expressed Sequence Tags for Camelus dromedarius

    PubMed Central

    Al-Swailem, Abdulaziz M.; Shehata, Maher M.; Abu-Duhier, Faisel M.; Al-Yamani, Essam J.; Al-Busadah, Khalid A.; Al-Arawi, Mohammed S.; Al-Khider, Ali Y.; Al-Muhaimeed, Abdullah N.; Al-Qahtani, Fahad H.; Manee, Manee M.; Al-Shomrani, Badr M.; Al-Qhtani, Saad M.; Al-Harthi, Amer S.; Akdemir, Kadir C.; Otu, Hasan H.

    2010-01-01

    Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF) analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF>300 bp and ∼40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/), hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism. PMID:20502665

  19. Recently published protein sequences. I.

    NASA Technical Reports Server (NTRS)

    Jukes, T. H.; Holmquist, R.

    1972-01-01

    Some polypeptide sequences that have been published in the 1972 scientific literature are listed. Only selected sequences are included. The compilation has two objectives. Current information between periods when more comprehensive compilations are published is to be assembled and the use of data that do not include arrangements of unsequenced peptides for 'maximum homology' is to be encouraged.

  20. The nucleotide sequence of 5S ribosomal RNA from Micrococcus lysodeikticus.

    PubMed Central

    Hori, H; Osawa, S; Murao, K; Ishikura, H

    1980-01-01

    The nucleotide sequence of ribosomal 5S RNA from Micrococcus lysodeikticus is pGUUACGGCGGCUAUAGCGUGGGGGAAACGCCCGGCCGUAUAUCGAACCCGGAAGCUAAGCCCCAUAGCGCCGAUGGUUACUGUAACCGGGAGGUUGUGGGAGAGUAGGUCGCCGCCGUGAOH. When compared to other 5S RNAs, the sequence homology is greatest with Thermus aquaticus, and these two 5S RNAs reveal several features intermediate between those of typical gram-positive bacteria and gram-negative bacteria. PMID:6780979

  1. Molecular evolution of an Avirulence Homolog (Avh) gene subfamily in Phytophthora ramorum

    Treesearch

    GossErica M.; Caroline M. Press; Niklaus J. Grünwald

    2008-01-01

    Pathogen effectors can serve a virulence function on behalf of the pathogen or trigger a rapid defense response in resistant hosts. Sequencing of the Phytophthora ramorum genome and subsequent analysis identified a diverse superfamily of approximately 350 genes that are homologous to the four known avirulence genes in plant pathogenic oomycetes and...

  2. Variants of cellobiohydrolases

    DOEpatents

    Bott, Richard R.; Foukaraki, Maria; Hommes, Ronaldus Wilhelmus; Kaper, Thijs; Kelemen, Bradley R.; Kralj, Slavko; Nikolaev, Igor; Sandgren, Mats; Van Lieshout, Johannes Franciscus Thomas; Van Stigt Thans, Sander

    2018-04-10

    Disclosed are a number of homologs and variants of Hypocrea jecorina Ce17A (formerly Trichoderma reesei cellobiohydrolase I or CBH1), nucleic acids encoding the same and methods for producing the same. The homologs and variant cellulases have the amino acid sequence of a glycosyl hydrolase of family 7A wherein one or more amino acid residues are substituted and/or deleted.

  3. Using SQL Databases for Sequence Similarity Searching and Analysis.

    PubMed

    Pearson, William R; Mackey, Aaron J

    2017-09-13

    Relational databases can integrate diverse types of information and manage large sets of similarity search results, greatly simplifying genome-scale analyses. By focusing on taxonomic subsets of sequences, relational databases can reduce the size and redundancy of sequence libraries and improve the statistical significance of homologs. In addition, by loading similarity search results into a relational database, it becomes possible to explore and summarize the relationships between all of the proteins in an organism and those in other biological kingdoms. This unit describes how to use relational databases to improve the efficiency of sequence similarity searching and demonstrates various large-scale genomic analyses of homology-related data. It also describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. The unit also introduces search_demo, a database that stores sequence similarity search results. The search_demo database is then used to explore the evolutionary relationships between E. coli proteins and proteins in other organisms in a large-scale comparative genomic analysis. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.

  4. Restriction site polymorphism-based candidate gene mapping for seedling drought tolerance in cowpea [Vigna unguiculata (L.) Walp.].

    PubMed

    Muchero, Wellington; Ehlers, Jeffrey D; Roberts, Philip A

    2010-02-01

    Quantitative trait loci (QTL) studies provide insight into the complexity of drought tolerance mechanisms. Molecular markers used in these studies also allow for marker-assisted selection (MAS) in breeding programs, enabling transfer of genetic factors between breeding lines without complete knowledge of their exact nature. However, potential for recombination between markers and target genes limit the utility of MAS-based strategies. Candidate gene mapping offers an alternative solution to identify trait determinants underlying QTL of interest. Here, we used restriction site polymorphisms to investigate co-location of candidate genes with QTL for seedling drought stress-induced premature senescence identified previously in cowpea. Genomic DNA isolated from 113 F(2:8) RILs of drought-tolerant IT93K503-1 and drought susceptible CB46 genotypes was digested with combinations of EcoR1 and HpaII, Mse1, or Msp1 restriction enzymes and amplified with primers designed from 13 drought-responsive cDNAs. JoinMap 3.0 and MapQTL 4.0 software were used to incorporate polymorphic markers onto the AFLP map and to analyze their association with the drought response QTL. Seven markers co-located with peaks of previously identified QTL. Isolation, sequencing, and blast analysis of these markers confirmed their significant homology with drought or other abiotic stress-induced expressed sequence tags (EST) from cowpea and other plant systems. Further, homology with coding sequences for a multidrug resistance protein 3 and a photosystem I assembly protein ycf3 was revealed in two of these candidates. These results provide a platform for the identification and characterization of genetic trait determinants underlying seedling drought tolerance in cowpea.

  5. The light gene of Drosophila melanogaster encodes a homologue of VPS41, a yeast gene involved in cellular-protein trafficking.

    PubMed

    Warner, T S; Sinclair, D A; Fitzpatrick, K A; Singh, M; Devlin, R H; Honda, B M

    1998-04-01

    Mutations in a number of genes affect eye colour in Drosophila melanogaster; some of these "eye-colour" genes have been shown to be involved in various aspects of cellular transport processes. In addition, combinations of viable mutant alleles of some of these genes, such as carnation (car) combined with either light (lt) or deep-orange (dor) mutants, show lethal interactions. Recently, dor was shown to be homologous to the yeast gene PEP3 (VPS18), which is known to be involved in intracellular trafficking. We have undertaken to extend our earlier work on the lt gene, in order to examine in more detail its expression pattern and to characterize its gene product via sequencing of a cloned cDNA. The gene appears to be expressed at relatively high levels in all stages and tissues examined, and shows strong homology to VPS41, a gene involved in cellular-protein trafficking in yeast and higher eukaryotes. Further genetic experiments also point to a role for lt in transport processes: we describe lethal interactions between viable alleles of lt and dor, as well as phenotypic interactions (reductions in eye pigment) between allels of lt and another eye-colour gene, garnet (g), whose gene product has close homology to a subunit of the human adaptor complex, AP-3.

  6. Identification of a Conserved Non-Protein-Coding Genomic Element that Plays an Essential Role in Alphabaculovirus Pathogenesis

    PubMed Central

    Kikhno, Irina

    2014-01-01

    Highly homologous sequences 154–157 bp in length grouped under the name of “conserved non-protein-coding element” (CNE) were revealed in all of the sequenced genomes of baculoviruses belonging to the genus Alphabaculovirus. A CNE alignment led to the detection of a set of highly conserved nucleotide clusters that occupy strictly conserved positions in the CNE sequence. The significant length of the CNE and conservation of both its length and cluster architecture were identified as a combination of characteristics that make this CNE different from known viral non-coding functional sequences. The essential role of the CNE in the Alphabaculovirus life cycle was demonstrated through the use of a CNE-knockout Autographa californica multiple nucleopolyhedrovirus (AcMNPV) bacmid. It was shown that the essential function of the CNE was not mediated by the presumed expression activities of the protein- and non-protein-coding genes that overlap the AcMNPV CNE. On the basis of the presented data, the AcMNPV CNE was categorized as a complex-structured, polyfunctional genomic element involved in an essential DNA transaction that is associated with an undefined function of the baculovirus genome. PMID:24740153

  7. Spliced synthetic genes as internal controls in RNA sequencing experiments.

    PubMed

    Hardwick, Simon A; Chen, Wendy Y; Wong, Ted; Deveson, Ira W; Blackburn, James; Andersen, Stacey B; Nielsen, Lars K; Mattick, John S; Mercer, Tim R

    2016-09-01

    RNA sequencing (RNA-seq) can be used to assemble spliced isoforms, quantify expressed genes and provide a global profile of the transcriptome. However, the size and diversity of the transcriptome, the wide dynamic range in gene expression and inherent technical biases confound RNA-seq analysis. We have developed a set of spike-in RNA standards, termed 'sequins' (sequencing spike-ins), that represent full-length spliced mRNA isoforms. Sequins have an entirely artificial sequence with no homology to natural reference genomes, but they align to gene loci encoded on an artificial in silico chromosome. The combination of multiple sequins across a range of concentrations emulates alternative splicing and differential gene expression, and it provides scaling factors for normalization between samples. We demonstrate the use of sequins in RNA-seq experiments to measure sample-specific biases and determine the limits of reliable transcript assembly and quantification in accompanying human RNA samples. In addition, we have designed a complementary set of sequins that represent fusion genes arising from rearrangements of the in silico chromosome to aid in cancer diagnosis. RNA sequins provide a qualitative and quantitative reference with which to navigate the complexity of the human transcriptome.

  8. Tomato immune receptor Ve1 recognizes effector of multiple fungal pathogens uncovered by genome and RNA sequencing

    PubMed Central

    de Jonge, Ronnie; Peter van Esse, H.; Maruthachalam, Karunakaran; Bolton, Melvin D.; Santhanam, Parthasarathy; Saber, Mojtaba Keykha; Zhang, Zhao; Usami, Toshiyuki; Lievens, Bart; Subbarao, Krishna V.; Thomma, Bart P. H. J.

    2012-01-01

    Fungal plant pathogens secrete effector molecules to establish disease on their hosts, and plants in turn use immune receptors to try to intercept these effectors. The tomato immune receptor Ve1 governs resistance to race 1 strains of the soil-borne vascular wilt fungi Verticillium dahliae and Verticillium albo-atrum, but the corresponding Verticillium effector remained unknown thus far. By high-throughput population genome sequencing, a single 50-Kb sequence stretch was identified that only occurs in race 1 strains, and subsequent transcriptome sequencing of Verticillium-infected Nicotiana benthamiana plants revealed only a single highly expressed ORF in this region, designated Ave1 (for Avirulence on Ve1 tomato). Functional analyses confirmed that Ave1 activates Ve1-mediated resistance and demonstrated that Ave1 markedly contributes to fungal virulence, not only on tomato but also on Arabidopsis. Interestingly, Ave1 is homologous to a widespread family of plant natriuretic peptides. Besides plants, homologous proteins were only found in the bacterial plant pathogen Xanthomonas axonopodis and the plant pathogenic fungi Colletotrichum higginsianum, Cercospora beticola, and Fusarium oxysporum f. sp. lycopersici. The distribution of Ave1 homologs, coincident with the presence of Ave1 within a flexible genomic region, strongly suggests that Verticillium acquired Ave1 from plants through horizontal gene transfer. Remarkably, by transient expression we show that also the Ave1 homologs from F. oxysporum and C. beticola can activate Ve1-mediated resistance. In line with this observation, Ve1 was found to mediate resistance toward F. oxysporum in tomato, showing that this immune receptor is involved in resistance against multiple fungal pathogens. PMID:22416119

  9. Genomic Sequencing and Characterization of Cynomolgus Macaque Cytomegalovirus▿

    PubMed Central

    Marsh, Angie K.; Willer, David O.; Ambagala, Aruna P. N.; Dzamba, Misko; Chan, Jacqueline K.; Pilon, Richard; Fournier, Jocelyn; Sandstrom, Paul; Brudno, Michael; MacDonald, Kelly S.

    2011-01-01

    Cytomegalovirus (CMV) infection is the most common opportunistic infection in immunosuppressed individuals, such as transplant recipients or people living with HIV/AIDS, and congenital CMV is the leading viral cause of developmental disabilities in infants. Due to the highly species-specific nature of CMV, animal models that closely recapitulate human CMV (HCMV) are of growing importance for vaccine development. Here we present the genomic sequence of a novel nonhuman primate CMV from cynomolgus macaques (Macaca fascicularis; CyCMV). CyCMV (Ottawa strain) was isolated from the urine of a healthy, captive-bred, 4-year-old cynomolgus macaque of Philippine origin, and the viral genome was sequenced using next-generation Illumina sequencing to an average of 516-fold coverage. The CyCMV genome is 218,041 bp in length, with 49.5% G+C content and 84% protein-coding density. We have identified 262 putative open reading frames (ORFs) with an average coding length of 789 bp. The genomic organization of CyCMV is largely colinear with that of rhesus macaque CMV (RhCMV). Of the 262 CyCMV ORFs, 137 are homologous to HCMV genes, 243 are homologous to RhCMV 68.1, and 200 are homologous to RhCMV 180.92. CyCMV encodes four ORFs that are not present in RhCMV strain 68.1 or 180.92 but have homologies with HCMV (UL30, UL74A, UL126, and UL146). Similar to HCMV, CyCMV does not produce the RhCMV-specific viral homologue of cyclooxygenase-2. This newly characterized CMV may provide a novel model in which to study CMV biology and HCMV vaccine development. PMID:21994460

  10. GeneBuilder: interactive in silico prediction of gene structure.

    PubMed

    Milanesi, L; D'Angelo, D; Rogozin, I B

    1999-01-01

    Prediction of gene structure in newly sequenced DNA becomes very important in large genome sequencing projects. This problem is complicated due to the exon-intron structure of eukaryotic genes and because gene expression is regulated by many different short nucleotide domains. In order to be able to analyse the full gene structure in different organisms, it is necessary to combine information about potential functional signals (promoter region, splice sites, start and stop codons, 3' untranslated region) together with the statistical properties of coding sequences (coding potential), information about homologous proteins, ESTs and repeated elements. We have developed the GeneBuilder system which is based on prediction of functional signals and coding regions by different approaches in combination with similarity searches in proteins and EST databases. The potential gene structure models are obtained by using a dynamic programming method. The program permits the use of several parameters for gene structure prediction and refinement. During gene model construction, selecting different exon homology levels with a protein sequence selected from a list of homologous proteins can improve the accuracy of the gene structure prediction. In the case of low homology, GeneBuilder is still able to predict the gene structure. The GeneBuilder system has been tested by using the standard set (Burset and Guigo, Genomics, 34, 353-367, 1996) and the performances are: 0.89 sensitivity and 0.91 specificity at the nucleotide level. The total correlation coefficient is 0.88. The GeneBuilder system is implemented as a part of the WebGene a the URL: http://www.itba.mi. cnr.it/webgene and TRADAT (TRAncription Database and Analysis Tools) launcher URL: http://www.itba.mi.cnr.it/tradat.

  11. Characterization of a periplasmic S1-like nuclease coded by the Mesorhizobium loti symbiosis island

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pimkin, Maxim; Miller, C. Glenn; Blakesley, Lauryn

    DNA sequences encoding hypothetical proteins homologous to S1 nuclease from Aspergillus oryzae are found in many organisms including fungi, plants, pathogenic bacteria, and eukaryotic parasites. One of these is the M1 nuclease of Mesorhizobium loti which we demonstrate herein to be an enzymatically active, soluble, and stable S1 homolog that lacks the extensive mannosyl-glycosylation found in eukaryotic S1 nuclease homologs. We have expressed the cloned M1 protein in M. loti and purified recombinant native M1 to near homogeneity and have also isolated a homogeneous M1 carboxy-terminal hexahistidine tag fusion protein. Mass spectrometry and N-terminal Edman degradation sequencing confirmed the proteinmore » identity. The enzymatic properties of the purified M1 nuclease are similar to those of S1. At acidic pH M1 is 25 times more active on single-stranded DNA than on double-stranded DNA and 3 times more active on single-stranded DNA than on single-stranded RNA. At neutral pH the RNase activity of M1 exceeds the DNase activity. M1 nicks supercoiled RF-I plasmid DNA and rapidly cuts the phosphodiester bond across from the nick in the resultant relaxed RF-II plasmid DNA. Therefore, M1 represents an active bacterial S1 homolog in spite of great sequence divergence. The biochemical characterization of M1 nuclease supports our sequence alignment that reveals the minimal 21 amino acid residues that are necessarily conserved for the structure and functions of this enzyme family. The ability of M1 to degrade RNA at neutral pH implies previously unappreciated roles of these nucleases in biological systems.« less

  12. Molecular cloning, sequence identification and tissue expression profile of three novel sheep (Ovis aries) genes - BCKDHA, NAGA and HEXA.

    PubMed

    Liu, G Y; Gao, S Z

    2009-01-01

    The complete coding sequences of three sheep genes- BCKDHA, NAGA and HEXA were amplified using the reverse transcriptase polymerase chain reaction (RT-PCR), based on the conserved sequence information of the mouse or other mammals. The nucleotide sequences of these three genes revealed that the sheep BCKDHA gene encodes a protein of 313 amino acids which has high homology with the BCKDHA gene that encodes a protein of 447 amino acids that has high homology with the Branched chain keto acid dehydrogenase El, alpha polypeptide (BCKDHA) of five species chimpanzee (93%), human (96%), crab-eating macaque (93%), bovine (98%) and mouse (91%). The sheep NAGA gene encodes a protein of 411 amino acids that has high homology with the alpha-N-acetylgalactosaminidase (NAGA) of five species human (85%), bovine (94%), mouse (91%), rat (83%) and chicken (74%). The sheep HEXA gene encodes a protein of 529 amino acids that has high homology with the hexosaminidase A(HEXA) of five species bovine (98%), human (84%), Bornean orangután (84%), rat (80%) and mouse (81%). Finally these three novel sheep genes were assigned to GenelDs: 100145857, 100145858 and 100145856. The phylogenetic tree analysis revealed that the sheep BCKDHA, NAGA, and HEXA all have closer genetic relationships to the BCKDHA, NAGA, and HEXA of bovine. Tissue expression profile analysis was also carried out and results revealed that sheep BCKDHA, NAGA and HEXA genes were differentially expressed in tissues including muscle, heart, liver, fat, kidney, lung, small and large intestine. Our experiment is the first to establish the primary foundation for further research on these three sheep genes.

  13. PigGIS: Pig Genomic Informatics System

    PubMed Central

    Ruan, Jue; Guo, Yiran; Li, Heng; Hu, Yafeng; Song, Fei; Huang, Xin; Kristiensen, Karsten; Bolund, Lars; Wang, Jun

    2007-01-01

    Pig Genomic Information System (PigGIS) is a web-based depository of pig (Sus scrofa) genomic learning mainly engineered for biomedical research to locate pig genes from their human homologs and position single nucleotide polymorphisms (SNPs) in different pig populations. It utilizes a variety of sequence data, including whole genome shotgun (WGS) reads and expressed sequence tags (ESTs), and achieves a successful mapping solution to the low-coverage genome problem. With the data presently available, we have identified a total of 15 700 pig consensus sequences covering 18.5 Mb of the homologous human exons. We have also recovered 18 700 SNPs and 20 800 unique 60mer oligonucleotide probes for future pig genome analyses. PigGIS can be freely accessed via the web at and . PMID:17090590

  14. Submegabase Clusters of Unstable Tandem Repeats Unique to the Tla Region of Mouse T Haplotypes

    PubMed Central

    Uehara, H.; Ebersole, T.; Bennett, D.; Artzt, K.

    1990-01-01

    We describe here the identification and genomic organization of mouse t haplotype-specific elements (TSEs) 7.8 and 5.8 kb in length. The TSEs exist as submegabase-long clusters of tandem repeats localized in the Tla region of the major histocompatibility complex of all t haplotype chromosomes examined. In contrast, no such clusters were detected among 12 inbred strains of Mus musculus and other Mus species; thus, clusters of TSEs represent the first absolutely qualitative difference between t haplotypes and wild-type chromosomes. Pulsed field gel electrophoresis shows that the number of clusters, and the number of repeats in each cluster are extremely variable. Dramatic quantitative differences of TSEs uniquely distinguish every independent t haplotype from any other. The complete nucleotide sequence of one 7.8-kb TSE reveals significant homology to the ETn (a major transcript in the early embryo of the mouse), and some homologies to intracisternal A-particles and the mammary tumor virus env gene. Apart from the diagnostic relevance to t haplotypes, evolutionary and functional significances are discussed with respect to chromosome structure and genetic recombination. PMID:2076812

  15. Structural Rearrangement in an RsmA/CsrA Ortholog of Pseudomonas aeruginosa Creates a Dimeric RNA-Binding Protein, RsmN

    PubMed Central

    Morris, Elizabeth R.; Hall, Gareth; Li, Chan; Heeb, Stephan; Kulkarni, Rahul V.; Lovelock, Laura; Silistre, Hazel; Messina, Marco; Cámara, Miguel; Emsley, Jonas; Williams, Paul; Searle, Mark S.

    2013-01-01

    Summary In bacteria, the highly conserved RsmA/CsrA family of RNA-binding proteins functions as global posttranscriptional regulators acting on mRNA translation and stability. Through phenotypic complementation of an rsmA mutant in Pseudomonas aeruginosa, we discovered a family member, termed RsmN. Elucidation of the RsmN crystal structure and that of the complex with a hairpin from the sRNA, RsmZ, reveals a uniquely inserted α helix, which redirects the polypeptide chain to form a distinctly different protein fold to the domain-swapped dimeric structure of RsmA homologs. The overall β sheet structure required for RNA recognition is, however, preserved with compensatory sequence and structure differences, allowing the RsmN dimer to target binding motifs in both structured hairpin loops and flexible disordered RNAs. Phylogenetic analysis indicates that, although RsmN appears unique to P. aeruginosa, homologous proteins with the inserted α helix are more widespread and arose as a consequence of a gene duplication event. PMID:23954502

  16. Taxonomic distribution and origins of the extended LHC (light-harvesting complex) antenna protein superfamily

    PubMed Central

    2010-01-01

    Background The extended light-harvesting complex (LHC) protein superfamily is a centerpiece of eukaryotic photosynthesis, comprising the LHC family and several families involved in photoprotection, like the LHC-like and the photosystem II subunit S (PSBS). The evolution of this complex superfamily has long remained elusive, partially due to previously missing families. Results In this study we present a meticulous search for LHC-like sequences in public genome and expressed sequence tag databases covering twelve representative photosynthetic eukaryotes from the three primary lineages of plants (Plantae): glaucophytes, red algae and green plants (Viridiplantae). By introducing a coherent classification of the different protein families based on both, hidden Markov model analyses and structural predictions, numerous new LHC-like sequences were identified and several new families were described, including the red lineage chlorophyll a/b-binding-like protein (RedCAP) family from red algae and diatoms. The test of alternative topologies of sequences of the highly conserved chlorophyll-binding core structure of LHC and PSBS proteins significantly supports the independent origins of LHC and PSBS families via two unrelated internal gene duplication events. This result was confirmed by the application of cluster likelihood mapping. Conclusions The independent evolution of LHC and PSBS families is supported by strong phylogenetic evidence. In addition, a possible origin of LHC and PSBS families from different homologous members of the stress-enhanced protein subfamily, a diverse and anciently paralogous group of two-helix proteins, seems likely. The new hypothesis for the evolution of the extended LHC protein superfamily proposed here is in agreement with the character evolution analysis that incorporates the distribution of families and subfamilies across taxonomic lineages. Intriguingly, stress-enhanced proteins, which are universally found in the genomes of green plants, red algae, glaucophytes and in diatoms with complex plastids, could represent an important and previously missing link in the evolution of the extended LHC protein superfamily. PMID:20673336

  17. Automated design of genomic Southern blot probes

    PubMed Central

    2010-01-01

    Background Sothern blotting is a DNA analysis technique that has found widespread application in molecular biology. It has been used for gene discovery and mapping and has diagnostic and forensic applications, including mutation detection in patient samples and DNA fingerprinting in criminal investigations. Southern blotting has been employed as the definitive method for detecting transgene integration, and successful homologous recombination in gene targeting experiments. The technique employs a labeled DNA probe to detect a specific DNA sequence in a complex DNA sample that has been separated by restriction-digest and gel electrophoresis. Critically for the technique to succeed the probe must be unique to the target locus so as not to cross-hybridize to other endogenous DNA within the sample. Investigators routinely employ a manual approach to probe design. A genome browser is used to extract DNA sequence from the locus of interest, which is searched against the target genome using a BLAST-like tool. Ideally a single perfect match is obtained to the target, with little cross-reactivity caused by homologous DNA sequence present in the genome and/or repetitive and low-complexity elements in the candidate probe. This is a labor intensive process often requiring several attempts to find a suitable probe for laboratory testing. Results We have written an informatic pipeline to automatically design genomic Sothern blot probes that specifically attempts to optimize the resultant probe, employing a brute-force strategy of generating many candidate probes of acceptable length in the user-specified design window, searching all against the target genome, then scoring and ranking the candidates by uniqueness and repetitive DNA element content. Using these in silico measures we can automatically design probes that we predict to perform as well, or better, than our previous manual designs, while considerably reducing design time. We went on to experimentally validate a number of these automated designs by Southern blotting. The majority of probes we tested performed well confirming our in silico prediction methodology and the general usefulness of the software for automated genomic Southern probe design. Conclusions Software and supplementary information are freely available at: http://www.genes2cognition.org/software/southern_blot PMID:20113467

  18. How to Choose the Suitable Template for Homology Modelling of GPCRs: 5-HT7 Receptor as a Test Case.

    PubMed

    Shahaf, Nir; Pappalardo, Matteo; Basile, Livia; Guccione, Salvatore; Rayan, Anwar

    2016-09-01

    G protein-coupled receptors (GPCRs) are a super-family of membrane proteins that attract great pharmaceutical interest due to their involvement in almost every physiological activity, including extracellular stimuli, neurotransmission, and hormone regulation. Currently, structural information on many GPCRs is mainly obtained by the techniques of computer modelling in general and by homology modelling in particular. Based on a quantitative analysis of eighteen antagonist-bound, resolved structures of rhodopsin family "A" receptors - also used as templates to build 153 homology models - it was concluded that a higher sequence identity between two receptors does not guarantee a lower RMSD between their structures, especially when their pair-wise sequence identity (within trans-membrane domain and/or in binding pocket) lies between 25 % and 40 %. This study suggests that we should consider all template receptors having a sequence identity ≤50 % with the query receptor. In fact, most of the GPCRs, compared to the currently available resolved structures of GPCRs, fall within this range and lack a correlation between structure and sequence. When testing suitability for structure-based drug design, it was found that choosing as a template the most similar resolved protein, based on sequence resemblance only, led to unsound results in many cases. Molecular docking analyses were carried out, and enrichment factors as well as attrition rates were utilized as criteria for assessing suitability for structure-based drug design. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  19. Progress of targeted genome modification approaches in higher plants.

    PubMed

    Cardi, Teodoro; Neal Stewart, C

    2016-07-01

    Transgene integration in plants is based on illegitimate recombination between non-homologous sequences. The low control of integration site and number of (trans/cis)gene copies might have negative consequences on the expression of transferred genes and their insertion within endogenous coding sequences. The first experiments conducted to use precise homologous recombination for gene integration commenced soon after the first demonstration that transgenic plants could be produced. Modern transgene targeting categories used in plant biology are: (a) homologous recombination-dependent gene targeting; (b) recombinase-mediated site-specific gene integration; (c) oligonucleotide-directed mutagenesis; (d) nuclease-mediated site-specific genome modifications. New tools enable precise gene replacement or stacking with exogenous sequences and targeted mutagenesis of endogeneous sequences. The possibility to engineer chimeric designer nucleases, which are able to target virtually any genomic site, and use them for inducing double-strand breaks in host DNA create new opportunities for both applied plant breeding and functional genomics. CRISPR is the most recent technology available for precise genome editing. Its rapid adoption in biological research is based on its inherent simplicity and efficacy. Its utilization, however, depends on available sequence information, especially for genome-wide analysis. We will review the approaches used for genome modification, specifically those for affecting gene integration and modification in higher plants. For each approach, the advantages and limitations will be noted. We also will speculate on how their actual commercial development and implementation in plant breeding will be affected by governmental regulations.

  20. Micropathogen Community Analysis in Hyalomma rufipes via High-Throughput Sequencing of Small RNAs

    PubMed Central

    Luo, Jin; Liu, Min-Xuan; Ren, Qiao-Yun; Chen, Ze; Tian, Zhan-Cheng; Hao, Jia-Wei; Wu, Feng; Liu, Xiao-Cui; Luo, Jian-Xun; Yin, Hong; Wang, Hui; Liu, Guang-Yuan

    2017-01-01

    Ticks are important vectors in the transmission of a broad range of micropathogens to vertebrates, including humans. Because of the role of ticks in disease transmission, identifying and characterizing the micropathogen profiles of tick populations have become increasingly important. The objective of this study was to survey the micropathogens of Hyalomma rufipes ticks. Illumina HiSeq2000 technology was utilized to perform deep sequencing of small RNAs (sRNAs) extracted from field-collected H. rufipes ticks in Gansu Province, China. The resultant sRNA library data revealed that the surveyed tick populations produced reads that were homologous to St. Croix River Virus (SCRV) sequences. We also observed many reads that were homologous to microbial and/or pathogenic isolates, including bacteria, protozoa, and fungi. As part of this analysis, a phylogenetic tree was constructed to display the relationships among the homologous sequences that were identified. The study offered a unique opportunity to gain insight into the micropathogens of H. rufipes ticks. The effective control of arthropod vectors in the future will require knowledge of the micropathogen composition of vectors harboring infectious agents. Understanding the ecological factors that regulate vector propagation in association with the prevalence and persistence of micropathogen lineages is also imperative. These interactions may affect the evolution of micropathogen lineages, especially if the micropathogens rely on the vector or host for dispersal. The sRNA deep-sequencing approach used in this analysis provides an intuitive method to survey micropathogen prevalence in ticks and other vector species. PMID:28861401

  1. Yeast Ran-Binding Protein 1 (Yrb1) Shuttles between the Nucleus and Cytoplasm and Is Exported from the Nucleus via a CRM1 (XPO1)-Dependent Pathway

    PubMed Central

    Künzler, Markus; Gerstberger, Thomas; Stutz, Françoise; Bischoff, F. Ralf; Hurt, Ed

    2000-01-01

    The RanGTP-binding protein RanBP1, which is located in the cytoplasm, has been implicated in release of nuclear export complexes from the cytoplasmic side of the nuclear pore complex. Here we show that Yrb1 (the yeast homolog of RanBP1) shuttles between the nucleus and the cytoplasm. Nuclear import of Yrb1 is a facilitated process that requires a short basic sequence within the Ran-binding domain (RBD). By contrast, nuclear export of Yrb1 requires an intact RBD, which forms a ternary complex with the Xpo1 (Crm1) NES receptor in the presence of RanGTP. Nuclear export of Yrb1, however, is insensitive towards leptomycin B, suggesting a novel type of substrate recognition between Yrb1 and Xpo1. Taken together, these data suggest that ongoing nuclear import and export is an important feature of Yrb1 function in vivo. PMID:10825193

  2. The Enhancer of split complex arose prior to the diversification of schizophoran flies and is strongly conserved between Drosophila and stalk-eyed flies (Diopsidae)

    PubMed Central

    2011-01-01

    Background In Drosophila, the Enhancer of split complex (E(spl)-C) comprises 11 bHLH and Bearded genes that function during Notch signaling to repress proneural identity in the developing peripheral nervous system. Comparison with other insects indicates that the basal state for Diptera is a single bHLH and Bearded homolog and that the expansion of the gene complex occurred in the lineage leading to Drosophila. However, comparative genomic data from other fly species that would elucidate the origin and sequence of gene duplication for the complex is lacking. Therefore, in order to examine the evolutionary history of the complex within Diptera, we reconstructed, using several fosmid clones, the entire E(spl)-complex in the stalk-eyed fly, Teleopsis dalmanni and collected additional homologs of E(spl)-C genes from searches of dipteran EST databases and the Glossina morsitans genome assembly. Results Comparison of the Teleopsis E(spl)-C gene organization with Drosophila indicates complete conservation in gene number and orientation between the species except that T. dalmanni contains a duplicated copy of E(spl)m5 that is not present in Drosophila. Phylogenetic analysis of E(spl)-complex bHLH and Bearded genes for several dipteran species clearly demonstrates that all members of the complex were present prior to the diversification of schizophoran flies. Comparison of upstream regulatory elements and 3' UTR domains between the species also reveals strong conservation for many of the genes and identifies several novel characteristics of E(spl)-C regulatory evolution including the discovery of a previously unidentified, highly conserved SPS+A domain between E(spl)mγ and E(spl)mβ. Conclusion Identifying the phylogenetic origin of E(spl)-C genes and their associated regulatory DNA is essential to understanding the functional significance of this well-studied gene complex. Results from this study provide numerous insights into the evolutionary history of the complex and will help refine the focus of studies examining the adaptive consequences of this gene expansion. PMID:22151427

  3. Increasing the structural coverage of tuberculosis drug targets.

    PubMed

    Baugh, Loren; Phan, Isabelle; Begley, Darren W; Clifton, Matthew C; Armour, Brianna; Dranow, David M; Taylor, Brandy M; Muruthi, Marvin M; Abendroth, Jan; Fairman, James W; Fox, David; Dieterich, Shellie H; Staker, Bart L; Gardberg, Anna S; Choi, Ryan; Hewitt, Stephen N; Napuli, Alberto J; Myers, Janette; Barrett, Lynn K; Zhang, Yang; Ferrell, Micah; Mundt, Elizabeth; Thompkins, Katie; Tran, Ngoc; Lyons-Abbott, Sally; Abramov, Ariel; Sekar, Aarthi; Serbzhinskiy, Dmitri; Lorimer, Don; Buchko, Garry W; Stacy, Robin; Stewart, Lance J; Edwards, Thomas E; Van Voorhis, Wesley C; Myler, Peter J

    2015-03-01

    High-resolution three-dimensional structures of essential Mycobacterium tuberculosis (Mtb) proteins provide templates for TB drug design, but are available for only a small fraction of the Mtb proteome. Here we evaluate an intra-genus "homolog-rescue" strategy to increase the structural information available for TB drug discovery by using mycobacterial homologs with conserved active sites. Of 179 potential TB drug targets selected for x-ray structure determination, only 16 yielded a crystal structure. By adding 1675 homologs from nine other mycobacterial species to the pipeline, structures representing an additional 52 otherwise intractable targets were solved. To determine whether these homolog structures would be useful surrogates in TB drug design, we compared the active sites of 106 pairs of Mtb and non-TB mycobacterial (NTM) enzyme homologs with experimentally determined structures, using three metrics of active site similarity, including superposition of continuous pharmacophoric property distributions. Pair-wise structural comparisons revealed that 19/22 pairs with >55% overall sequence identity had active site Cα RMSD <1 Å, >85% side chain identity, and ≥80% PSAPF (similarity based on pharmacophoric properties) indicating highly conserved active site shape and chemistry. Applying these results to the 52 NTM structures described above, 41 shared >55% sequence identity with the Mtb target, thus increasing the effective structural coverage of the 179 Mtb targets over three-fold (from 9% to 32%). The utility of these structures in TB drug design can be tested by designing inhibitors using the homolog structure and assaying the cognate Mtb enzyme; a promising test case, Mtb cytidylate kinase, is described. The homolog-rescue strategy evaluated here for TB is also generalizable to drug targets for other diseases. Copyright © 2014 Elsevier Ltd. All rights reserved.

  4. Increasing the structural coverage of tuberculosis drug targets

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Baugh, Loren; Phan, Isabelle; Begley, Darren W.

    High-resolution three-dimensional structures of essential Mycobacterium tuberculosis (Mtb) proteins provide templates for TB drug design, but are available for only a small fraction of the Mtb proteome. Here we evaluate an intra-genus “homolog-rescue” strategy to increase the structural information available for TB drug discovery by using mycobacterial homologs with conserved active sites. We found that of 179 potential TB drug targets selected for x-ray structure determination, only 16 yielded a crystal structure. By adding 1675 homologs from nine other mycobacterial species to the pipeline, structures representing an additional 52 otherwise intractable targets were solved. To determine whether these homolog structuresmore » would be useful surrogates in TB drug design, we compared the active sites of 106 pairs of Mtb and non-TB mycobacterial (NTM) enzyme homologs with experimentally determined structures, using three metrics of active site similarity, including superposition of continuous pharmacophoric property distributions. Pair-wise structural comparisons revealed that 19/22 pairs with >55% overall sequence identity had active site Cα RMSD <1 Å, >85% side chain identity, and ≥80% PS APF (similarity based on pharmacophoric properties) indicating highly conserved active site shape and chemistry. Applying these results to the 52 NTM structures described above, 41 shared >55% sequence identity with the Mtb target, thus increasing the effective structural coverage of the 179 Mtb targets over three-fold (from 9% to 32%). The utility of these structures in TB drug design can be tested by designing inhibitors using the homolog structure and assaying the cognate Mtb enzyme; a promising test case, Mtb cytidylate kinase, is described. The homolog-rescue strategy evaluated here for TB is also generalizable to drug targets for other diseases.« less

  5. Increasing the structural coverage of tuberculosis drug targets

    DOE PAGES

    Baugh, Loren; Phan, Isabelle; Begley, Darren W.; ...

    2014-12-19

    High-resolution three-dimensional structures of essential Mycobacterium tuberculosis (Mtb) proteins provide templates for TB drug design, but are available for only a small fraction of the Mtb proteome. Here we evaluate an intra-genus “homolog-rescue” strategy to increase the structural information available for TB drug discovery by using mycobacterial homologs with conserved active sites. We found that of 179 potential TB drug targets selected for x-ray structure determination, only 16 yielded a crystal structure. By adding 1675 homologs from nine other mycobacterial species to the pipeline, structures representing an additional 52 otherwise intractable targets were solved. To determine whether these homolog structuresmore » would be useful surrogates in TB drug design, we compared the active sites of 106 pairs of Mtb and non-TB mycobacterial (NTM) enzyme homologs with experimentally determined structures, using three metrics of active site similarity, including superposition of continuous pharmacophoric property distributions. Pair-wise structural comparisons revealed that 19/22 pairs with >55% overall sequence identity had active site Cα RMSD <1 Å, >85% side chain identity, and ≥80% PS APF (similarity based on pharmacophoric properties) indicating highly conserved active site shape and chemistry. Applying these results to the 52 NTM structures described above, 41 shared >55% sequence identity with the Mtb target, thus increasing the effective structural coverage of the 179 Mtb targets over three-fold (from 9% to 32%). The utility of these structures in TB drug design can be tested by designing inhibitors using the homolog structure and assaying the cognate Mtb enzyme; a promising test case, Mtb cytidylate kinase, is described. The homolog-rescue strategy evaluated here for TB is also generalizable to drug targets for other diseases.« less

  6. Increasing the Structural Coverage of Tuberculosis Drug Targets

    PubMed Central

    Baugh, Loren; Phan, Isabelle; Begley, Darren W.; Clifton, Matthew C.; Armour, Brianna; Dranow, David M.; Taylor, Brandy M.; Muruthi, Marvin M.; Abendroth, Jan; Fairman, James W.; Fox, David; Dieterich, Shellie H.; Staker, Bart L.; Gardberg, Anna S.; Choi, Ryan; Hewitt, Stephen N.; Napuli, Alberto J.; Myers, Janette; Barrett, Lynn K.; Zhang, Yang; Ferrell, Micah; Mundt, Elizabeth; Thompkins, Katie; Tran, Ngoc; Lyons-Abbott, Sally; Abramov, Ariel; Sekar, Aarthi; Serbzhinskiy, Dmitri; Lorimer, Don; Buchko, Garry W.; Stacy, Robin; Stewart, Lance J.; Edwards, Thomas E.; Van Voorhis, Wesley C.; Myler, Peter J.

    2015-01-01

    High-resolution three-dimensional structures of essential Mycobacterium tuberculosis (Mtb) proteins provide templates for TB drug design, but are available for only a small fraction of the Mtb proteome. Here we evaluate an intra-genus “homolog-rescue” strategy to increase the structural information available for TB drug discovery by using mycobacterial homologs with conserved active sites. Of 179 potential TB drug targets selected for x-ray structure determination, only 16 yielded a crystal structure. By adding 1675 homologs from nine other mycobacterial species to the pipeline, structures representing an additional 52 otherwise intractable targets were solved. To determine whether these homolog structures would be useful surrogates in TB drug design, we compared the active sites of 106 pairs of Mtb and non-TB mycobacterial (NTM) enzyme homologs with experimentally determined structures, using three metrics of active site similarity, including superposition of continuous pharmacophoric property distributions. Pair-wise structural comparisons revealed that 19/22 pairs with >55% overall sequence identity had active site Cα RMSD <1Å, >85% side chain identity, and ≥80% PSAPF (similarity based on pharmacophoric properties) indicating highly conserved active site shape and chemistry. Applying these results to the 52 NTM structures described above, 41 shared >55% sequence identity with the Mtb target, thus increasing the effective structural coverage of the 179 Mtb targets over three-fold (from 9% to 32%). The utility of these structures in TB drug design can be tested by designing inhibitors using the homolog structure and assaying the cognate Mtb enzyme; a promising test case, Mtb cytidylate kinase, is described. The homolog-rescue strategy evaluated here for TB is also generalizable to drug targets for other diseases. PMID:25613812

  7. Homology groups for particles on one-connected graphs

    NASA Astrophysics Data System (ADS)

    MaciÄ Żek, Tomasz; Sawicki, Adam

    2017-06-01

    We present a mathematical framework for describing the topology of configuration spaces for particles on one-connected graphs. In particular, we compute the homology groups over integers for different classes of one-connected graphs. Our approach is based on some fundamental combinatorial properties of the configuration spaces, Mayer-Vietoris sequences for different parts of configuration spaces, and some limited use of discrete Morse theory. As one of the results, we derive the closed-form formulae for ranks of the homology groups for indistinguishable particles on tree graphs. We also give a detailed discussion of the second homology group of the configuration space of both distinguishable and indistinguishable particles. Our motivation is the search for new kinds of quantum statistics.

  8. Three Dimensional Structure of the MqsR:MqsA Complex: A Novel TA Pair Comprised of a Toxin Homologous to RelE and an Antitoxin with Unique Properties

    PubMed Central

    Kim, Younghoon; Arruda, Jennifer M.; Davenport, Andrew; Wood, Thomas K.; Peti, Wolfgang; Page, Rebecca

    2009-01-01

    One mechanism by which bacteria survive environmental stress is through the formation of bacterial persisters, a sub-population of genetically identical quiescent cells that exhibit multidrug tolerance and are highly enriched in bacterial toxins. Recently, the Escherichia coli gene mqsR (b3022) was identified as the gene most highly upregulated in persisters. Here, we report multiple individual and complex three-dimensional structures of MqsR and its antitoxin MqsA (B3021), which reveal that MqsR:MqsA form a novel toxin:antitoxin (TA) pair. MqsR adopts an α/β fold that is homologous with the RelE/YoeB family of bacterial ribonuclease toxins. MqsA is an elongated dimer that neutralizes MqsR toxicity. As expected for a TA pair, MqsA binds its own promoter. Unexpectedly, it also binds the promoters of genes important for E. coli physiology (e.g., mcbR, spy). Unlike canonical antitoxins, MqsA is also structured throughout its entire sequence, binds zinc and coordinates DNA via its C- and not N-terminal domain. These studies reveal that TA systems, especially the antitoxins, are significantly more diverse than previously recognized and provide new insights into the role of toxins in maintaining the persister state. PMID:20041169

  9. A putative lateral flagella of the cystic fibrosis pathogen Burkholderia dolosa regulates swimming motility and host cytokine production

    PubMed Central

    Clark, Bradley S.; Weatherholt, Molly; Renaud, Diane; Scott, David; LiPuma, John J.; Priebe, Gregory; Gerard, Craig

    2018-01-01

    Burkholderia dolosa caused an outbreak in the cystic fibrosis clinic at Boston Children’s Hospital and was associated with high mortality in these patients. This species is part of a larger complex of opportunistic pathogens known as the Burkholderia cepacia complex (Bcc). Compared to other species in the Bcc, B. dolosa is highly transmissible; thus understanding its virulence mechanisms is important for preventing future outbreaks. The genome of one of the outbreak strains, AU0158, revealed a homolog of the lafA gene encoding a putative lateral flagellin, which, in other non-Bcc species, is used for movement on solid surfaces, attachment to host cells, or movement inside host cells. Here, we analyzed the conservation of the lafA gene and protein sequences, which are distinct from those of the polar flagella, and found lafA homologs to be present in numerous β-proteobacteria but notably absent from most other Bcc species. A lafA deletion mutant in B. dolosa showed a greater swimming motility than wild-type due to an increase in the number of polar flagella, but did not appear to contribute to biofilm formation, host cell invasion, or murine lung colonization or persistence over time. However, the lafA gene was important for cytokine production in human peripheral blood mononuclear cells, suggesting it may have a role in recognition by the human immune response. PMID:29346379

  10. A conserved segmental duplication within ELA.

    PubMed

    Brinkmeyer-Langford, C L; Murphy, W J; Childers, C P; Skow, L C

    2010-12-01

    The assembled genomic sequence of the horse major histocompatibility complex (MHC) (equine lymphocyte antigen, ELA) is very similar to the homologous human HLA, with the notable exception of a large segmental duplication at the boundary of ELA class I and class III that is absent in HLA. The segmental duplication consists of a ∼ 710 kb region of at least 11 repeated blocks: 10 blocks each contain an MHC class I-like sequence and the helicase domain portion of a BAT1-like sequence, and the remaining unit contains the full-length BAT1 gene. Similar genomic features were found in other Perissodactyls, indicating an ancient origin, which is consistent with phylogenetic analyses. Reverse-transcriptase PCR (RT-PCR) of mRNA from peripheral white blood cells of healthy and chronically or acutely infected horses detected transcription from predicted open reading frames in several of the duplicated blocks. This duplication is not present in the sequenced MHCs of most other mammals, although a similar feature at the same relative position is present in the feline MHC (FLA). Striking sequence conservation throughout Perissodactyl evolution is consistent with a functional role for at least some of the genes included within this segmental duplication. © 2010 The Authors, Journal compilation © 2010 Stichting International Foundation for Animal Genetics.

  11. RaptorX-Property: a web server for protein structure property prediction.

    PubMed

    Wang, Sheng; Li, Wei; Liu, Shiwang; Xu, Jinbo

    2016-07-08

    RaptorX Property (http://raptorx2.uchicago.edu/StructurePropertyPred/predict/) is a web server predicting structure property of a protein sequence without using any templates. It outperforms other servers, especially for proteins without close homologs in PDB or with very sparse sequence profile (i.e. carries little evolutionary information). This server employs a powerful in-house deep learning model DeepCNF (Deep Convolutional Neural Fields) to predict secondary structure (SS), solvent accessibility (ACC) and disorder regions (DISO). DeepCNF not only models complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent property labels. Our experimental results show that, tested on CASP10, CASP11 and the other benchmarks, this server can obtain ∼84% Q3 accuracy for 3-state SS, ∼72% Q8 accuracy for 8-state SS, ∼66% Q3 accuracy for 3-state solvent accessibility, and ∼0.89 area under the ROC curve (AUC) for disorder prediction. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  12. Variability Studies of Two Prunus-Infecting Fabaviruses with the Aid of High-Throughput Sequencing

    PubMed Central

    Sarkisova, Tatiana; Lenz, Ondřej; Přibylová, Jaroslava; Špak, Josef; Lotos, Leonidas; Beta, Christina; Katsiani, Asimina; Candresse, Thierry

    2018-01-01

    During their lifetime, perennial woody plants are expected to face multiple infection events. Furthermore, multiple genotypes of individual virus species may co-infect the same host. This may eventually lead to a situation where plants harbor complex communities of viral species/strains. Using high-throughput sequencing, we describe co-infection of sweet and sour cherry trees with diverse genomic variants of two closely related viruses, namely prunus virus F (PrVF) and cherry virus F (CVF). Both viruses are most homologous to members of the Fabavirus genus (Secoviridae family). The comparison of CVF and PrVF RNA2 genomic sequences suggests that the two viruses may significantly differ in their expression strategy. Indeed, similar to comoviruses, the smaller genomic segment of PrVF, RNA2, may be translated in two collinear proteins while CVF likely expresses only the shorter of these two proteins. Linked with the observation that identity levels between the coat proteins of these two viruses are significantly below the family species demarcation cut-off, these findings support the idea that CVF and PrVF represent two separate Fabavirus species. PMID:29670059

  13. Homology of pendrin, sodium-iodide symporter and apical iodide transporter.

    PubMed

    Benvenga, Salvatore; Guarneri, Fabrizio

    2018-06-01

    We observed local homology between human pendrin and sodium/iodide symporter (NIS), that was absent in the NIS-homologous sodium/monocarboxylate transporter or apical iodide transporter (AIT) which, however, does not transport iodide. Thus, we analyzed the full proteins. They shared 63 identical and 66 similar residues (overall homology 14.4%, but 21% when omitting intervening sequences of 15 or more residues). Pendrin was more homologous to NIS (25%) than AIT (20%), particularly in the STAS domain (sulfate transporter and antisigma factor antagonist). Homology was concentrated in 11 segments, with 3/11 involving the STAS domain. In 9/11, homology was greater with NIS (45-58.3%) than with AIT (8.3-42.3%); in 4 of these 9 segments, homology was comparable to or greater than that between NIS and AIT (8.3-52.6%). Pendrin residues which are mutated in Pendred's syndrome are identical to those in the aligned position of NIS and AIT. Hypothyroidism-associated pendrin mutations almost always fall within 4/11 segments. These are the first data that show homology between pendrin and NIS, and topographic relationships between pendrin mutations and the hypothyroid phenotype of PDS.

  14. Characterization of Centromeric Histone H3 (CENH3) Variants in Cultivated and Wild Carrots (Daucus sp.)

    PubMed Central

    Dunemann, Frank; Schrader, Otto; Budahn, Holger; Houben, Andreas

    2014-01-01

    In eukaryotes, centromeres are the assembly sites for the kinetochore, a multi-protein complex to which spindle microtubules are attached at mitosis and meiosis, thereby ensuring segregation of chromosomes during cell division. They are specified by incorporation of CENH3, a centromere specific histone H3 variant which replaces canonical histone H3 in the nucleosomes of functional centromeres. To lay a first foundation of a putative alternative haploidization strategy based on centromere-mediated genome elimination in cultivated carrots, in the presented research we aimed at the identification and cloning of functional CENH3 genes in Daucus carota and three distantly related wild species of genus Daucus varying in basic chromosome numbers. Based on mining the carrot transcriptome followed by a subsequent PCR-based cloning, homologous coding sequences for CENH3s of the four Daucus species were identified. The ORFs of the CENH3 variants were very similar, and an amino acid sequence length of 146 aa was found in three out of the four species. Comparison of Daucus CENH3 amino acid sequences with those of other plant CENH3s as well as their phylogenetic arrangement among other dicot CENH3s suggest that the identified genes are authentic CENH3 homologs. To verify the location of the CENH3 protein in the kinetochore regions of the Daucus chromosomes, a polyclonal antibody based on a peptide corresponding to the N-terminus of DcCENH3 was developed and used for anti-CENH3 immunostaining of mitotic root cells. The chromosomal location of CENH3 proteins in the centromere regions of the chromosomes could be confirmed. For genetic localization of the CENH3 gene in the carrot genome, a previously constructed linkage map for carrot was used for mapping a CENH3-specific simple sequence repeat (SSR) marker, and the CENH3 locus was mapped on the carrot chromosome 9. PMID:24887084

  15. Complete Sequencing of pNDM-HK Encoding NDM-1 Carbapenemase from a Multidrug-Resistant Escherichia coli Strain Isolated in Hong Kong

    PubMed Central

    Ho, Pak Leung; Lo, Wai U.; Yeung, Man Kiu; Lin, Chi Ho; Chow, Kin Hung; Ang, Irene; Tong, Amy Hin Yan; Bao, Jessie Yun-Juan; Lok, Si; Lo, Janice Yee Chi

    2011-01-01

    Background The emergence of plasmid-mediated carbapenemases, such as NDM-1 in Enterobacteriaceae is a major public health issue. Since they mediate resistance to virtually all β-lactam antibiotics and there is often co-resistance to other antibiotic classes, the therapeutic options for infections caused by these organisms are very limited. Methodology We characterized the first NDM-1 producing E. coli isolate recovered in Hong Kong. The plasmid encoding the metallo-β-lactamase gene was sequenced. Principal Findings The plasmid, pNDM-HK readily transferred to E. coli J53 at high frequencies. It belongs to the broad host range IncL/M incompatibility group and is 88803 bp in size. Sequence alignment showed that pNDM-HK has a 55 kb backbone which shared 97% homology with pEL60 originating from the plant pathogen, Erwina amylovora in Lebanon and a 28.9 kb variable region. The plasmid backbone includes the mucAB genes mediating ultraviolet light resistance. The 28.9 kb region has a composite transposon-like structure which includes intact or truncated genes associated with resistance to β-lactams (bla TEM-1, bla NDM-1, Δbla DHA-1), aminoglycosides (aacC2, armA), sulphonamides (sul1) and macrolides (mel, mph2). It also harbors the following mobile elements: IS26, ISCR1, tnpU, tnpAcp2, tnpD, ΔtnpATn1 and insL. Certain blocks within the 28.9 kb variable region had homology with the corresponding sequences in the widely disseminated plasmids, pCTX-M3, pMUR050 and pKP048 originating from bacteria in Poland in 1996, in Spain in 2002 and in China in 2006, respectively. Significance The genetic support of NDM-1 gene suggests that it has evolved through complex pathways. The association with broad host range plasmid and multiple mobile genetic elements explain its observed horizontal mobility in multiple bacterial taxa. PMID:21445317

  16. Identification of a novel mitochondrial complex I assembly factor ACDH-12 in Caenorhabditis elegans.

    PubMed

    Chuaijit, Sirithip; Boonyatistan, Worawit; Boonchuay, Pichsinee; Metheetrairut, Chanatip; Suthammarak, Wichit

    2018-03-11

    Assembly of complex I of the mitochondrial respiratory chain (MRC) requires not only structural subunits for electron transport, but also assembly factors. In the nematode Caenorhabditis elegans, NUAF-1 and NUAF-3 are the only two assembly factors that have been characterized. In this study, we identify ACDH-12 as an assembly factor of the respiratory complex I. We demonstrate for the first time that a deficiency of ACDH-12 affects the formation and function of complex I. RNAi knockdown of acdh-12 also shortens lifespan and decreases fecundity. Although ACDH-12 has long been recognized as a very long-chain acyl-CoA dehydrogenase (VLCAD), the knockdown nematodes did not exhibit any change in body fat content. We suggested that in Caenorhabditis elegans, ACDH-12 is required for the assembly of the respiratory complex I, but may not be crucial to fatty acid oxidation. Interestingly, sequence analysis shows high homology between ACDH-12 and the human ACAD9, a protein that has initially been identified as a VLCAD, but later found to also be involved in the assembly of complex I in human. Copyright © 2018 Elsevier B.V. and Mitochondria Research Society. All rights reserved.

  17. Sequence and structure insights of kazal type thrombin inhibitor protein: Studied with phylogeny, homology modeling and dynamic MM/GBSA studies.

    PubMed

    Jadhav, Aparna; Dash, RadhaCharan; Hirwani, Raj; Abdin, Malik

    2018-03-01

    Despite the wide medical importance of serine protease inhibitors, many of kazal type proteins are still to be explored. These thrombin inhibiting proteins are found in the digestive system of hematophagous organisms mainly Arthropods. We studied one of such protein i.e. Kazal type-1 protein from sand-fly Phlebotomus papatasi as its structure and interaction with thrombin is unclear. Initially, Dipetalin a kazal-follistasin domain protein was run through PSI-BLAST to retrieve related sequences. Using this set of sequence a phylogenetic tree was constructed, which identified a distantly related kazal type-1 protein. A three-dimensional structure was predicted for this protein and was aligned with Rhodniin for further evaluation. To have a comparative understanding of it's binding at the thrombin active site, the aligned kazal model-thrombin and rhodniin-thrombin complexes were subjected to molecular dynamics simulations. Dynamics analysis with reference to main chain RMSD, H-chain residue RMSF and total energy showed rhodniin-thrombin complex as a more stable system. Further, the MM/GBSA method was applied that calculated the binding free energy (ΔG binding ) for rhodniin and kazal model as -220.32kcal/Mol and -90.70kcal/Mol, respectively. Thus, it shows that kazal model has weaker bonding with thrombin, unlike rhodniin. Copyright © 2017 Elsevier B.V. All rights reserved.

  18. Cellobiose dehydrogenase of Chaetomium sp. INBI 2-26(-): structural basis of enhanced activity toward glucose at neutral pH.

    PubMed

    Vasilchenko, Liliya G; Karapetyan, Karen N; Yershevich, Olga P; Ludwig, Roland; Zamocky, Marcel; Peterbauer, Clemens K; Haltrich, Dietmar; Rabinovich, Mikhail L

    2011-05-01

    Cellobiose dehydrogenase (CDH) is an extracellular fungal flavocytochrome specifically oxidizing cellooligosaccharides and lactose to corresponding (-lactones by a variety of electron acceptors. In contrast to basidiomycetous CDHs, CDHs of ascomycetes also display certain activity toward glucose. The objective of this study was to establish the structural reasons of such an activity of CDH from mesophilic ascomycete Chaetomium sp. INBI 2-26 (ChCDH). The complete amino acid sequence of ChCDH displayed high levels of similarity with the amino acid sequences of CDHs from the thermophilic fungi Thielavia heterotallica and Myriococcum thermophilum. Peptide mass fingerprinting of purified ChCDH provided evidence for the oxidation of methionine residues in the FAD-domain. Comparative homology modeling of the structure of the ChCDH FAD-domain in complex with the transition state analog based on the structure of the same complex of basidiomycetous CDH (1NAA) as template indicated possible structural reasons for the enhanced activity of ascomycetous CDHs toward glucose at neutral pH, which is a prerequisite for application of CDH in a variety of biocompatible biosensors and biofuel cells. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  19. Host Cell Virus Entry Mediated by Australian Bat Lyssavirus Envelope G glycoprotein

    DTIC Science & Technology

    2013-10-24

    39 Figure 7. Comparison of the amino acid sequences of Saccolaimus and Pteropus ABLV G mature protein... sequence analysis revealed that the PCR products were identical. Sequence comparisons of the ABLV N and other lyssavirus N proteins showed that ABLV...Saccolaimus flaviventris) (129). Nucleoprotein sequence comparisons revealed that the Saccolaimus N protein shared 96% amino acid homology with the Pteropus

  20. A novel mutation in TFL1 homolog affecting determinacy in cowpea (Vigna unguiculata).

    PubMed

    Dhanasekar, P; Reddy, K S

    2015-02-01

    Mutations in the widely conserved Arabidopsis Terminal Flower 1 (TFL1) gene and its homologs have been demonstrated to result in determinacy across genera, the knowledge of which is lacking in cowpea. Understanding the molecular events leading to determinacy of apical meristems could hasten development of cowpea varieties with suitable ideotypes. Isolation and characterization of a novel mutation in cowpea TFL1 homolog (VuTFL1) affecting determinacy is reported here for the first time. Cowpea TFL1 homolog was amplified using primers designed based on conserved sequences in related genera and sequence variation was analysed in three gamma ray-induced determinate mutants, their indeterminate parent "EC394763" and two indeterminate varieties. The analyses of sequence variation exposed a novel SNP distinguishing the determinate mutants from the indeterminate types. The non-synonymous point mutation in exon 4 at position 1,176 resulted from transversion of cytosine (C) to adenine (A) leading to an amino acid change (Pro-136 to His) in determinate mutants. The effect of the mutation on protein function and stability was predicted to be detrimental using different bioinformatics/computational tools. The functionally significant novel substitution mutation is hypothesized to affect determinacy in the cowpea mutants. Development of suitable regeneration protocols in this hitherto recalcitrant crop and subsequent complementation assay in mutants or over-expressing assay in parents could decisively conclude the role of the SNP in regulating determinacy in these cowpea mutants.

Top